blob: cf8416331b8c1da63a4017b590b173309cdff7d3 [file] [log] [blame]
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Apache Beam – Learn about Beam</title><link>/documentation/</link><description>Recent content in Learn about Beam on Apache Beam</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/documentation/index.xml" rel="self" type="application/rss+xml"/><item><title>Documentation: About Beam ML</title><link>/documentation/ml/about-ml/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/ml/about-ml/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="about-beam-ml">About Beam ML&lt;/h1>
&lt;table>
&lt;tr>
&lt;td>
&lt;a>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;/a>
&lt;/td>
&lt;td>
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/extensions/python/transforms/RunInference.html">
&lt;img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="30px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>You can use Apache Beam to:&lt;/p>
&lt;ul>
&lt;li>Process large volumes of data, both for preprocessing and for inference.&lt;/li>
&lt;li>Experiment with your data during the exploration phase of your project.&lt;/li>
&lt;li>Upscale your data pipelines as part of your ML ops ecosystem in a production environment.&lt;/li>
&lt;li>Run your model in production on a varying data load, both in batch and streaming.&lt;/li>
&lt;/ul>
&lt;h2 id="aiml-workloads">AI/ML workloads&lt;/h2>
&lt;p>You can use Apache Beam for data validation, data preprocessing, model validation, and model deployment and inference.&lt;/p>
&lt;p>&lt;img src="/images/ml-workflows.svg" alt="Overview of AI/ML building blocks and where Apache Beam can be used">&lt;/p>
&lt;ol>
&lt;li>Data ingestion: Incoming new data is either stored in your file system or database, or published to a messaging queue.&lt;/li>
&lt;li>&lt;strong>Data validation&lt;/strong>: After you receieve your data, check the quality of the data. For example, you might want to detect outliers and calculate standard deviations and class distributions.&lt;/li>
&lt;li>&lt;strong>Data preprocessing&lt;/strong>: After you validate your data, transform the data so that it&amp;rsquo;s ready to use to train your model.&lt;/li>
&lt;li>Model training: When your data is ready, train your AI/ML model. This step is typically repeated multiple times, depending on the quality of your trained model.&lt;/li>
&lt;li>Model validation: Before you deploy your model, validate its performance and accuracy.&lt;/li>
&lt;li>&lt;strong>Model deployment&lt;/strong>: Deploy your model, using it to run inference on new or existing data.&lt;/li>
&lt;/ol>
&lt;p>To keep your model up to date and performing well as your data grows and evolves, run these steps multiple times. In addition, you can apply ML ops to your project to automate the AI/ML workflows throughout the model and data lifecycle. Use orchestrators to automate this flow and to handle the transition between the different building blocks in your project.&lt;/p>
&lt;h2 id="use-runinference">Use RunInference&lt;/h2>
&lt;p>The RunInference API is a &lt;code>PTransform&lt;/code> optimized for machine learning inferences that lets you efficiently use ML models in your pipelines. The API includes the following features:&lt;/p>
&lt;ul>
&lt;li>To efficiently feed your model, dynamically batches inputs based on pipeline throughput using Apache Beam&amp;rsquo;s &lt;code>BatchElements&lt;/code> transform.&lt;/li>
&lt;li>To balance memory and throughput usage, determines the optimal number of models to load using a central model manager. Shares these models across threads and processes as needed to maximize throughput.&lt;/li>
&lt;li>Ensures that your pipeline uses the most recently deployed version of your model with the &lt;a href="#automatic-model-refresh">Automatic model refresh&lt;/a> feature.&lt;/li>
&lt;li>Supports &lt;a href="#use-pre-trained-models">multiple frameworks and model hubs&lt;/a>, including Tensorflow, Pytorch, Sklearn, XGBoost, Hugging Face, TensorFlow Hub, Vertex AI, TensorRT, and ONNX.&lt;/li>
&lt;li>Supports arbitrary frameworks using a &lt;a href="#use-custom-models">custom model handler&lt;/a>.&lt;/li>
&lt;li>Supports &lt;a href="#multi-model-pipelines">multi-model pipelines&lt;/a>.&lt;/li>
&lt;li>Lets you use GPUs on supported runners to increase inference speed. For more information, see &lt;a href="https://cloud.google.com/dataflow/docs/gpu">GPUs with Dataflow&lt;/a> in the Dataflow documentation.&lt;/li>
&lt;/ul>
&lt;h3 id="support-and-limitations">Support and limitations&lt;/h3>
&lt;ul>
&lt;li>The RunInference API is supported in Apache Beam 2.40.0 and later versions.&lt;/li>
&lt;li>Model handlers are available for PyTorch, scikit-learn, TensorFlow, Hugging Face, Vertex AI, ONNX, TensorRT, and XGBoost. You can also use a custom model handler.&lt;/li>
&lt;li>The RunInference API supports batch and streaming pipelines.&lt;/li>
&lt;li>The RunInference API supports both remote inference and inteference local to the runner worker.&lt;/li>
&lt;/ul>
&lt;h3 id="batchelements-ptransform">BatchElements PTransform&lt;/h3>
&lt;p>To take advantage of the optimizations of vectorized inference that many models implement, the &lt;code>BatchElements&lt;/code> transform is used as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy &lt;code>ndarrays&lt;/code>, we call &lt;code>numpy.stack()&lt;/code>, and for torch &lt;code>Tensor&lt;/code> elements, we call &lt;code>torch.stack()&lt;/code>.&lt;/p>
&lt;p>To customize the settings for &lt;code>beam.BatchElements&lt;/code>, in &lt;code>ModelHandler&lt;/code>, override the &lt;code>batch_elements_kwargs&lt;/code> function. For example, use &lt;code>min_batch_size&lt;/code> to set the lowest number of elements per batch or &lt;code>max_batch_size&lt;/code> to set the highest number of elements per batch.&lt;/p>
&lt;p>For more information, see the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements">&lt;code>BatchElements&lt;/code> transform documentation&lt;/a>.&lt;/p>
&lt;h3 id="shared-helper-class">Shared helper class&lt;/h3>
&lt;p>Using the &lt;code>Shared&lt;/code> class within the RunInference implementation makes it possible to load the model only once per process and share it with all DoFn instances created in that process. This feature reduces memory consumption and model loading time. For more information, see the
&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20">&lt;code>Shared&lt;/code> class documentation&lt;/a>.&lt;/p>
&lt;h3 id="modify-a-python-pipeline-to-use-an-ml-model">Modify a Python pipeline to use an ML model&lt;/h3>
&lt;p>To use the RunInference transform, add the following code to your pipeline:&lt;/p>
&lt;pre tabindex="0">&lt;code>from apache_beam.ml.inference.base import RunInference
with pipeline as p:
predictions = ( p | &amp;#39;Read&amp;#39; &amp;gt;&amp;gt; beam.ReadFromSource(&amp;#39;a_source&amp;#39;)
| &amp;#39;RunInference&amp;#39; &amp;gt;&amp;gt; RunInference(&amp;lt;model_handler&amp;gt;)
&lt;/code>&lt;/pre>&lt;p>Replace &lt;code>model_handler&lt;/code> with the model handler setup code.&lt;/p>
&lt;p>To import models, you need to configure a &lt;code>ModelHandler&lt;/code> object that wraps the underlying model. Which model handler you import depends on the framework and type of data structure that contains the inputs. The &lt;code>ModelHandler&lt;/code> object also allows you to set environment variables needed for inference using the &lt;code>env_vars&lt;/code> keyword argument. The following examples show some model handlers that you might want to import.&lt;/p>
&lt;pre tabindex="0">&lt;code>from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
from tfx_bsl.public.beam.run_inference import CreateModelHandler
&lt;/code>&lt;/pre>&lt;h3 id="use-pre-trained-models">Use pre-trained models&lt;/h3>
&lt;p>The section provides requirements for using pre-trained models with PyTorch, Scikit-learn, and Tensorflow.&lt;/p>
&lt;h4 id="pytorch">PyTorch&lt;/h4>
&lt;p>You need to provide a path to a file that contains the model&amp;rsquo;s saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:&lt;/p>
&lt;ol>
&lt;li>Download the pre-trained weights and host them in a location that the pipeline can access.&lt;/li>
&lt;li>Pass the path of the model weights to the PyTorch &lt;code>ModelHandler&lt;/code> by using the following code: &lt;code>state_dict_path=&amp;lt;path_to_weights&amp;gt;&lt;/code>.&lt;/li>
&lt;/ol>
&lt;p>See &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch.ipynb">this notebook&lt;/a>
that illustrates running PyTorch models with Apache Beam.&lt;/p>
&lt;h4 id="scikit-learn">Scikit-learn&lt;/h4>
&lt;p>You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:&lt;/p>
&lt;ol>
&lt;li>Download the pickled model class and host it in a location that the pipeline can access.&lt;/li>
&lt;li>Pass the path of the model to the Sklearn &lt;code>ModelHandler&lt;/code> by using the following code:
&lt;code>model_uri=&amp;lt;path_to_pickled_file&amp;gt;&lt;/code> and &lt;code>model_file_type: &amp;lt;ModelFileType&amp;gt;&lt;/code>, where you can specify
&lt;code>ModelFileType.PICKLE&lt;/code> or &lt;code>ModelFileType.JOBLIB&lt;/code>, depending on how the model was serialized.&lt;/li>
&lt;/ol>
&lt;p>See &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_sklearn.ipynb">this notebook&lt;/a>
that illustrates running Scikit-learn models with Apache Beam.&lt;/p>
&lt;h4 id="tensorflow">TensorFlow&lt;/h4>
&lt;p>To use TensorFlow with the RunInference API, you have two options:&lt;/p>
&lt;ol>
&lt;li>Use the built-in TensorFlow Model Handlers in Apache Beam SDK - &lt;code>TFModelHandlerNumpy&lt;/code> and &lt;code>TFModelHandlerTensor&lt;/code>.
&lt;ul>
&lt;li>Depending on the type of input for your model, use &lt;code>TFModelHandlerNumpy&lt;/code> for &lt;code>numpy&lt;/code> input and &lt;code>TFModelHandlerTensor&lt;/code> for &lt;code>tf.Tensor&lt;/code> input respectively.&lt;/li>
&lt;li>Use tensorflow 2.7 or later.&lt;/li>
&lt;li>Pass the path of the model to the TensorFlow &lt;code>ModelHandler&lt;/code> by using &lt;code>model_uri=&amp;lt;path_to_trained_model&amp;gt;&lt;/code>.&lt;/li>
&lt;li>Alternatively, you can pass the path to saved weights of the trained model, a function to build the model using &lt;code>create_model_fn=&amp;lt;function&amp;gt;&lt;/code>, and set the &lt;code>model_type=ModelType.SAVED_WEIGHTS&lt;/code>.
See &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb">this notebook&lt;/a> that illustrates running Tensorflow models with Built-in model handlers.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Using &lt;code>tfx_bsl&lt;/code>.
&lt;ul>
&lt;li>Use this approach if your model input is of type &lt;code>tf.Example&lt;/code>.&lt;/li>
&lt;li>Use &lt;code>tfx_bsl&lt;/code> version 1.10.0 or later.&lt;/li>
&lt;li>Create a model handler using &lt;code>tfx_bsl.public.beam.run_inference.CreateModelHandler()&lt;/code>.&lt;/li>
&lt;li>Use the model handler with the &lt;a href="/releases/pydoc/current/apache_beam.ml.inference.base.html">&lt;code>apache_beam.ml.inference.base.RunInference&lt;/code>&lt;/a> transform.
See &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_tensorflow.ipynb">this notebook&lt;/a>
that illustrates running TensorFlow models with Apache Beam and tfx-bsl.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h3 id="use-custom-models">Use custom models&lt;/h3>
&lt;p>If you would like to use a model that isn&amp;rsquo;t specified by one of the supported frameworks, the RunInference API is designed flexibly to allow you to use any custom machine learning models.
You only need to create your own &lt;code>ModelHandler&lt;/code> or &lt;code>KeyedModelHandler&lt;/code> with logic to load your model and use it to run the inference.&lt;/p>
&lt;p>A simple example can be found in &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb">this notebook&lt;/a>.
The &lt;code>load_model&lt;/code> method shows how to load the model using a popular &lt;code>spaCy&lt;/code> package while &lt;code>run_inference&lt;/code> shows how to run the inference on a batch of examples.&lt;/p>
&lt;h3 id="runinference-patterns">RunInference Patterns&lt;/h3>
&lt;p>This section suggests patterns and best practices that you can use to make your inference pipelines simpler,
more robust, and more efficient.&lt;/p>
&lt;h4 id="use-a-keyed-modelhandler-object">Use a keyed ModelHandler object&lt;/h4>
&lt;p>If a key is attached to the examples, wrap &lt;code>KeyedModelHandler&lt;/code> around the &lt;code>ModelHandler&lt;/code> object:&lt;/p>
&lt;pre tabindex="0">&lt;code>from apache_beam.ml.inference.base import KeyedModelHandler
keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
with pipeline as p:
data = p | beam.Create([
(&amp;#39;img1&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
(&amp;#39;img2&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
(&amp;#39;img3&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
])
predictions = data | RunInference(keyed_model_handler)
&lt;/code>&lt;/pre>&lt;p>If you are unsure if your data is keyed, you can use &lt;code>MaybeKeyedModelHandler&lt;/code>.&lt;/p>
&lt;p>You can also use a &lt;code>KeyedModelHandler&lt;/code> to load several different models based on their associated key.
The following example loads a model by using &lt;code>config1&lt;/code>. That model is used for inference for all examples associated
with &lt;code>key1&lt;/code>. It loads a second model by using &lt;code>config2&lt;/code>. That model is used for all examples associated with &lt;code>key2&lt;/code> and &lt;code>key3&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>from apache_beam.ml.inference.base import KeyedModelHandler
keyed_model_handler = KeyedModelHandler([
KeyModelMapping([&amp;#39;key1&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config1&amp;gt;)),
KeyModelMapping([&amp;#39;key2&amp;#39;, &amp;#39;key3&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config2&amp;gt;))
])
with pipeline as p:
data = p | beam.Create([
(&amp;#39;key1&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
(&amp;#39;key2&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
(&amp;#39;key3&amp;#39;, torch.tensor([[1,2,3],[4,5,6],...])),
])
predictions = data | RunInference(keyed_model_handler)
&lt;/code>&lt;/pre>&lt;p>For a more detailed example, see the notebook
&lt;a href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb">Run ML inference with multiple differently-trained models&lt;/a>.&lt;/p>
&lt;p>Loading multiple models at the same times increases the risk of out of memory errors (OOMs). By default, &lt;code>KeyedModelHandler&lt;/code> doesn&amp;rsquo;t
limit the number of models loaded into memory at the same time. If the models don&amp;rsquo;t all fit into memory,
your pipeline might fail with an out of memory error. To avoid this issue, use the &lt;code>max_models_per_worker_hint&lt;/code> parameter
to set the maximum number of models that can be loaded into memory at the same time.&lt;/p>
&lt;p>The following example loads at most two models per SDK worker process at a time. It unloads models that aren&amp;rsquo;t
currently in use.&lt;/p>
&lt;pre tabindex="0">&lt;code>mhs = [
KeyModelMapping([&amp;#39;key1&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config1&amp;gt;)),
KeyModelMapping([&amp;#39;key2&amp;#39;, &amp;#39;key3&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config2&amp;gt;)),
KeyModelMapping([&amp;#39;key4&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config3&amp;gt;)),
KeyModelMapping([&amp;#39;key5&amp;#39;, &amp;#39;key6&amp;#39;, &amp;#39;key7&amp;#39;], PytorchModelHandlerTensor(&amp;lt;config4&amp;gt;)),
]
keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
&lt;/code>&lt;/pre>&lt;p>Runners that have multiple SDK worker processes on a given machine load at most
&lt;code>max_models_per_worker_hint*&amp;lt;num worker processes&amp;gt;&lt;/code> models onto the machine.&lt;/p>
&lt;p>Leave enough space for the models and any additional memory needs from other transforms.
Because the memory might not be released immediately after a model is offloaded,
leaving an additional buffer is recommended.&lt;/p>
&lt;p>&lt;strong>Note&lt;/strong>: Having many models but a small &lt;code>max_models_per_worker_hint&lt;/code> can cause &lt;em>memory thrashing&lt;/em>, where
a large amount of execution time is used to swap models in and out of memory. To reduce the likelihood and impact
of memory thrashing, if you&amp;rsquo;re using a distributed runner, insert a
&lt;a href="https://beam.apache.org/documentation/transforms/python/aggregation/groupbykey/">&lt;code>GroupByKey&lt;/code>&lt;/a> transform before your
inference step. The &lt;code>GroupByKey&lt;/code> transform reduces thrashing by ensuring that elements with the same key and model are
collocated on the same worker.&lt;/p>
&lt;p>For more information, see &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.KeyedModelHandler">&lt;code>KeyedModelHander&lt;/code>&lt;/a>.&lt;/p>
&lt;h4 id="use-the-predictionresult-object">Use the PredictionResult object&lt;/h4>
&lt;p>When doing a prediction in Apache Beam, the output &lt;code>PCollection&lt;/code> includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.&lt;/p>
&lt;p>The &lt;code>PredictionResult&lt;/code> object is a &lt;code>NamedTuple&lt;/code> that contains both the input and the inferences, named &lt;code>example&lt;/code> and &lt;code>inference&lt;/code>, respectively. When keys are passed with the input data to the RunInference transform, the output &lt;code>PCollection&lt;/code> returns a &lt;code>Tuple[str, PredictionResult]&lt;/code>, which is the key and the &lt;code>PredictionResult&lt;/code> object. Your pipeline interacts with a &lt;code>PredictionResult&lt;/code> object in steps after the RunInference transform.&lt;/p>
&lt;pre tabindex="0">&lt;code>class PostProcessor(beam.DoFn):
def process(self, element: Tuple[str, PredictionResult]):
key, prediction_result = element
inputs = prediction_result.example
predictions = prediction_result.inference
# Post-processing logic
result = ...
yield (key, result)
with pipeline as p:
output = (
p | &amp;#39;Read&amp;#39; &amp;gt;&amp;gt; beam.ReadFromSource(&amp;#39;a_source&amp;#39;)
| &amp;#39;PyTorchRunInference&amp;#39; &amp;gt;&amp;gt; RunInference(&amp;lt;keyed_model_handler&amp;gt;)
| &amp;#39;ProcessOutput&amp;#39; &amp;gt;&amp;gt; beam.ParDo(PostProcessor()))
&lt;/code>&lt;/pre>&lt;p>If you need to use this object explicitly, include the following line in your pipeline to import the object:&lt;/p>
&lt;pre tabindex="0">&lt;code>from apache_beam.ml.inference.base import PredictionResult
&lt;/code>&lt;/pre>&lt;p>For more information, see the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65">&lt;code>PredictionResult&lt;/code> documentation&lt;/a>.&lt;/p>
&lt;h4 id="automatic-model-refresh">Automatic model refresh&lt;/h4>
&lt;p>To automatically update the model being used with the RunInference &lt;code>PTransform&lt;/code> without stopping the pipeline, pass a &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata">&lt;code>ModelMetadata&lt;/code>&lt;/a> side input &lt;code>PCollection&lt;/code> to the RunInference input parameter &lt;code>model_metadata_pcoll&lt;/code>.&lt;/p>
&lt;p>&lt;code>ModelMetdata&lt;/code> is a &lt;code>NamedTuple&lt;/code> containing:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>model_id&lt;/code>: Unique identifier for the model. This can be a file path or a URL where the model can be accessed. It is used to load the model for inference. The URL or file path must be in the compatible format so that the respective &lt;code>ModelHandlers&lt;/code> can load the models without errors.&lt;/p>
&lt;p>For example, &lt;code>PyTorchModelHandler&lt;/code> initially loads a model using weights and a model class. If you pass in weights from a different model class when you update the model using side inputs, the model doesn&amp;rsquo;t load properly, because it expects the weights from the original model class.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>model_name&lt;/code>: Human-readable name for the model. You can use this name to identify the model in the metrics generated by the RunInference transform.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Use cases:&lt;/p>
&lt;ul>
&lt;li>Use &lt;code>WatchFilePattern&lt;/code> as side input to the RunInference &lt;code>PTransform&lt;/code> to automatically update the ML model. For more information, see &lt;a href="https://beam.apache.org/documentation/ml/side-input-updates">Use &lt;code>WatchFilePattern&lt;/code> as side input to auto-update ML models in RunInference&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>The side input &lt;code>PCollection&lt;/code> must follow the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton">&lt;code>AsSingleton&lt;/code>&lt;/a> view to avoid errors.&lt;/p>
&lt;p>&lt;strong>Note&lt;/strong>: If the main &lt;code>PCollection&lt;/code> emits inputs and a side input has yet to receive inputs, the main &lt;code>PCollection&lt;/code> is buffered until there is
an update to the side input. This could happen with global windowed side inputs with data driven triggers, such as &lt;code>AfterCount&lt;/code> and &lt;code>AfterProcessingTime&lt;/code>. Until the side input is updated, emit the default or initial model ID that is used to pass the respective &lt;code>ModelHandler&lt;/code> as a side input.&lt;/p>
&lt;h4 id="preprocess-and-postprocess-your-records">Preprocess and postprocess your records&lt;/h4>
&lt;p>With RunInference, you can add preprocessing and postprocessing operations to your transform.
To apply preprocessing operations, use &lt;code>with_preprocess_fn&lt;/code> on your model handler:&lt;/p>
&lt;pre tabindex="0">&lt;code>inference = pcoll | RunInference(model_handler.with_preprocess_fn(lambda x : do_something(x)))
&lt;/code>&lt;/pre>&lt;p>To apply postprocessing operations, use &lt;code>with_postprocess_fn&lt;/code> on your model handler:&lt;/p>
&lt;pre tabindex="0">&lt;code>inference = pcoll | RunInference(model_handler.with_postprocess_fn(lambda x : do_something_to_result(x)))
&lt;/code>&lt;/pre>&lt;p>You can also chain multiple pre- and postprocessing operations:&lt;/p>
&lt;pre tabindex="0">&lt;code>inference = pcoll | RunInference(
model_handler.with_preprocess_fn(
lambda x : do_something(x)
).with_preprocess_fn(
lambda x : do_something_else(x)
).with_postprocess_fn(
lambda x : do_something_after_inference(x)
).with_postprocess_fn(
lambda x : do_something_else_after_inference(x)
))
&lt;/code>&lt;/pre>&lt;p>The preprocessing function is run before batching and inference. This function maps your input &lt;code>PCollection&lt;/code>
to the base input type of the model handler. If you apply multiple preprocessing functions, they run on your original
&lt;code>PCollection&lt;/code> in the order of last applied to first applied.&lt;/p>
&lt;p>The postprocessing function runs after inference. This function maps the output type of the base model handler
to your desired output type. If you apply multiple postprocessing functions, they run on your original
inference result in the order of first applied to last applied.&lt;/p>
&lt;h4 id="handle-errors">Handle errors&lt;/h4>
&lt;p>To handle errors robustly while using RunInference, you can use a &lt;em>dead-letter queue&lt;/em>. The dead-letter queue outputs failed records into a separate &lt;code>PCollection&lt;/code> for further processing.
This &lt;code>PCollection&lt;/code> can then be analyzed and sent to a storage system, where it can be reviewed and resubmitted to the pipeline, or discarded.
RunInference has built-in support for dead-letter queues. You can use a dead-letter queue by applying &lt;code>with_exception_handling&lt;/code> to your RunInference transform:&lt;/p>
&lt;pre tabindex="0">&lt;code>main, other = pcoll | RunInference(model_handler).with_exception_handling()
other.failed_inferences | beam.Map(print) # insert logic to handle failed records here
&lt;/code>&lt;/pre>&lt;p>You can also apply this pattern to RunInference transforms with associated pre- and postprocessing operations:&lt;/p>
&lt;pre tabindex="0">&lt;code>main, other = pcoll | RunInference(model_handler.with_preprocess_fn(f1).with_postprocess_fn(f2)).with_exception_handling()
other.failed_preprocessing[0] | beam.Map(print) # handles failed preprocess operations, indexed in the order in which they were applied
other.failed_inferences | beam.Map(print) # handles failed inferences
other.failed_postprocessing[0] | beam.Map(print) # handles failed postprocess operations, indexed in the order in which they were applied
&lt;/code>&lt;/pre>&lt;h4 id="run-inference-from-a-java-pipeline">Run inference from a Java pipeline&lt;/h4>
&lt;p>The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam&amp;rsquo;s &lt;a href="/documentation/programming-guide/#multi-language-pipelines">Multi-language Pipelines framework&lt;/a>. For information about the Java wrapper transform, see &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java">RunInference.java&lt;/a>. To try it out, see the &lt;a href="https://github.com/apache/beam/tree/master/examples/multi-language">Java Sklearn Mnist Classification example&lt;/a>. Additionally, see &lt;a href="https://beam.apache.org/documentation/ml/multi-language-inference/">Using RunInference from Java SDK&lt;/a> for an example of a composite Python transform that uses the RunInference API along with preprocessing and postprocessing from a Beam Java SDK pipeline.&lt;/p>
&lt;h2 id="custom-inference">Custom Inference&lt;/h2>
&lt;p>The RunInference API doesn&amp;rsquo;t currently support making remote inference calls using, for example, the Natural Language API or the Cloud Vision API. Therefore, in order to use these remote APIs with Apache Beam, you need to write custom inference calls. The &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb">Remote inference in Apache Beam notebook&lt;/a> shows how to implement a custom remote inference call using &lt;code>beam.DoFn&lt;/code>. When you implement a remote inference for real life projects, consider the following factors:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>API quotas and the heavy load you might incur on your external API. To optimize the calls to an external API, you can confgure &lt;code>PipelineOptions&lt;/code> to limit the parallel calls to the external remote API.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Be prepared to encounter, identify, and handle failure as gracefully as possible. Use techniques like exponential backoff and dead-letter queues (unprocessed messages queues).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When running inference with an external API, batch your input together to allow for more efficient execution.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Consider monitoring and measuring the performance of a pipeline when deploying, because monitoring can provide insight into the status and health of the application.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="multi-model-pipelines">Multi-model pipelines&lt;/h2>
&lt;p>Use the RunInference transform to add multiple inference models to your pipeline. Multi-model pipelines can be useful for A/B testing or for building out cascade models made up of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more. For more information, see &lt;a href="https://beam.apache.org/documentation/ml/multi-model-pipelines/">Multi-model pipelines&lt;/a>.&lt;/p>
&lt;h3 id="ab-pattern">A/B Pattern&lt;/h3>
&lt;pre tabindex="0">&lt;code>with pipeline as p:
data = p | &amp;#39;Read&amp;#39; &amp;gt;&amp;gt; beam.ReadFromSource(&amp;#39;a_source&amp;#39;)
model_a_predictions = data | RunInference(&amp;lt;model_handler_A&amp;gt;)
model_b_predictions = data | RunInference(&amp;lt;model_handler_B&amp;gt;)
&lt;/code>&lt;/pre>&lt;p>Where &lt;code>model_handler_A&lt;/code> and &lt;code>model_handler_B&lt;/code> are the model handler setup code.&lt;/p>
&lt;h3 id="cascade-pattern">Cascade Pattern&lt;/h3>
&lt;pre tabindex="0">&lt;code>with pipeline as p:
data = p | &amp;#39;Read&amp;#39; &amp;gt;&amp;gt; beam.ReadFromSource(&amp;#39;a_source&amp;#39;)
model_a_predictions = data | RunInference(&amp;lt;model_handler_A&amp;gt;)
model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(&amp;lt;model_handler_B&amp;gt;)
&lt;/code>&lt;/pre>&lt;p>Where &lt;code>model_handler_A&lt;/code> and &lt;code>model_handler_B&lt;/code> are the model handler setup code.&lt;/p>
&lt;h3 id="use-resource-hints-for-different-model-requirements">Use Resource Hints for Different Model Requirements&lt;/h3>
&lt;p>When using multiple models in a single pipeline, different models may have different memory or worker SKU requirements.
Resource hints allow you to provide information to a runner about the compute resource requirements for each step in your
pipeline.&lt;/p>
&lt;p>For example, the following snippet extends the previous cascade pattern with hints for each RunInference call
to specify RAM and hardware accelerator requirements:&lt;/p>
&lt;pre tabindex="0">&lt;code>with pipeline as p:
data = p | &amp;#39;Read&amp;#39; &amp;gt;&amp;gt; beam.ReadFromSource(&amp;#39;a_source&amp;#39;)
model_a_predictions = data | RunInference(&amp;lt;model_handler_A&amp;gt;).with_resource_hints(min_ram=&amp;#34;20GB&amp;#34;)
model_b_predictions = model_a_predictions
| beam.Map(some_post_processing)
| RunInference(&amp;lt;model_handler_B&amp;gt;).with_resource_hints(
min_ram=&amp;#34;4GB&amp;#34;,
accelerator=&amp;#34;type:nvidia-tesla-k80;count:1;install-nvidia-driver&amp;#34;)
&lt;/code>&lt;/pre>&lt;p>For more information on resource hints, see &lt;a href="/documentation/runtime/resource-hints/">Resource hints&lt;/a>.&lt;/p>
&lt;h2 id="model-validation">Model validation&lt;/h2>
&lt;p>Model validation allows you to benchmark your model’s performance against a previously unseen dataset. You can extract chosen metrics, create visualizations, log metadata, and compare the performance of different models with the end goal of validating whether your model is ready to deploy. Beam provides support for running model evaluation on a TensorFlow model directly inside your pipeline.&lt;/p>
&lt;p>The &lt;a href="/documentation/ml/model-evaluation">ML model evaluation&lt;/a> page shows how to integrate model evaluation as part of your pipeline by using &lt;a href="https://www.tensorflow.org/tfx/guide/tfma">TensorFlow Model Analysis (TFMA)&lt;/a>.&lt;/p>
&lt;h2 id="troubleshooting">Troubleshooting&lt;/h2>
&lt;p>If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.&lt;/p>
&lt;h3 id="unable-to-batch-tensor-elements">Unable to batch tensor elements&lt;/h3>
&lt;p>RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to the &lt;code>RunInference&lt;/code> transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, the following error might occur:&lt;/p>
&lt;p>&lt;code>File &amp;quot;/beam/sdks/python/apache_beam/ml/inference/pytorch_inference.py&amp;quot;, line 232, in run_inference batched_tensors = torch.stack(key_to_tensor_list[key]) RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [10] at entry 1 [while running 'PyTorchRunInference/ParDo(_RunInferenceDoFn)']&lt;/code>&lt;/p>
&lt;p>To avoid this issue, either use elements of the same size, or disable batching.&lt;/p>
&lt;p>&lt;strong>Option 1: Use elements of the same size&lt;/strong>&lt;/p>
&lt;p>Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. In this scenario, you could disable batching (see option 2).&lt;/p>
&lt;p>&lt;strong>Option 2: Disable batching&lt;/strong>&lt;/p>
&lt;p>Disable batching by overriding the &lt;code>batch_elements_kwargs&lt;/code> function in your ModelHandler and setting the maximum batch size (&lt;code>max_batch_size&lt;/code>) to one: &lt;code>max_batch_size=1&lt;/code>. For more information, see
&lt;a href="/documentation/ml/about-ml/#batchelements-ptransform">BatchElements PTransforms&lt;/a>. For an example, see our &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py">language modeling example&lt;/a>.&lt;/p>
&lt;h2 id="related-links">Related links&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/elementwise/runinference">RunInference transforms&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference">RunInference API pipeline examples&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_basic.ipynb">RunInference public codelab&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/beam/tree/master/examples/notebooks/beam-ml">RunInference notebooks&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1">RunInference benchmarks&lt;/a>&lt;/li>
&lt;/ul>
&lt;table>
&lt;tr>
&lt;td>
&lt;a>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.html#apache_beam.ml.inference.RunInference"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;/a>
&lt;/td>
&lt;td>
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/extensions/python/transforms/RunInference.html">
&lt;img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="30px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/table></description></item><item><title>Documentation: AI Platform integration patterns</title><link>/documentation/patterns/ai-platform/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/ai-platform/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="ai-platform-integration-patterns">AI Platform integration patterns&lt;/h1>
&lt;p>This page describes common patterns in pipelines with Google Cloud AI Platform transforms.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="analysing-the-structure-and-meaning-of-text">Analysing the structure and meaning of text&lt;/h2>
&lt;p>This section shows how to use &lt;a href="https://cloud.google.com/natural-language">Google Cloud Natural Language API&lt;/a> to perform text analysis.&lt;/p>
&lt;p>Beam provides a PTransform called &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.gcp.naturallanguageml.html#apache_beam.ml.gcp.naturallanguageml.AnnotateText">AnnotateText&lt;/a>. The transform takes a PCollection of type &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.gcp.naturallanguageml.html#apache_beam.ml.gcp.naturallanguageml.Document">Document&lt;/a>. Each Document object contains various information about text. This includes the content, whether it is a plain text or HTML, an optional language hint and other settings.
&lt;code>AnnotateText&lt;/code> produces response object of type &lt;code>AnnotateTextResponse&lt;/code> returned from the API. &lt;code>AnnotateTextResponse&lt;/code> is a protobuf message which contains a lot of attributes, some of which are complex structures.&lt;/p>
&lt;p>Here is an example of a pipeline that creates in-memory PCollection of strings, changes each string to Document object and invokes Natural Language API. Then, for each response object, a function is called to extract certain results of analysis.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">features&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">nlp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">types&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AnnotateTextRequest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Features&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_entities&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_document_sentiment&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_entity_sentiment&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">extract_syntax&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">responses&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;My experience so far has been fantastic! &amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;I&lt;/span>&lt;span class="se">\&amp;#39;&lt;/span>&lt;span class="s1">d really recommend this product.&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">nlp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Document&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">type&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;PLAIN_TEXT&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">nlp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AnnotateText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">features&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">extract_sentiments&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Parse sentiments to JSON&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dumps&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Write sentiments&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;sentiments.txt&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">extract_entities&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Parse entities to JSON&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dumps&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Write entities&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;entities.txt&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">analyze_dependency_tree&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Parse adjacency list to JSON&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dumps&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Write adjacency list&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;adjancency_list.txt&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">AnnotateTextRequest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Features&lt;/span> &lt;span class="n">features&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AnnotateTextRequest&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Features&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">newBuilder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setExtractEntities&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setExtractDocumentSentiment&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setExtractEntitySentiment&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setExtractSyntax&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">AnnotateText&lt;/span> &lt;span class="n">annotateText&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">AnnotateText&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">newBuilder&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">setFeatures&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">features&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">AnnotateTextResponse&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">responses&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;My experience so far has been fantastic, &amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34;I\&amp;#39;d really recommend this product.&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Document&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Document&lt;/span>&lt;span class="o">&amp;gt;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">input&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Document&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">newBuilder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setContent&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setType&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Document&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Type&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">PLAIN_TEXT&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">annotateText&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextSentiments&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">extractSentiments&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">TextSentiments&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;)&lt;/span> &lt;span class="n">TextSentiments&lt;/span>&lt;span class="o">::&lt;/span>&lt;span class="n">toJson&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;sentiments.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">maps&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">extractEntities&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">mapEntitiesToJson&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;entities.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">responses&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">lists&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">maps&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">lists&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">analyzeDependencyTree&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">mapDependencyTreesToJson&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;adjacency_list.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="extracting-sentiments">Extracting sentiments&lt;/h3>
&lt;p>This is a part of response object returned from the API. Sentence-level sentiments can be found in &lt;code>sentences&lt;/code> attribute. &lt;code>sentences&lt;/code> behaves like a standard Python sequence, therefore all core language features (like iteration or slicing) will work. Overall sentiment can be found in &lt;code>document_sentiment&lt;/code> attribute.&lt;/p>
&lt;pre tabindex="0">&lt;code>sentences {
text {
content: &amp;#34;My experience so far has been fantastic!&amp;#34;
}
sentiment {
magnitude: 0.8999999761581421
score: 0.8999999761581421
}
}
sentences {
text {
content: &amp;#34;I\&amp;#39;d really recommend this product.&amp;#34;
begin_offset: 41
}
sentiment {
magnitude: 0.8999999761581421
score: 0.8999999761581421
}
}
...many lines omitted
document_sentiment {
magnitude: 1.899999976158142
score: 0.8999999761581421
}
&lt;/code>&lt;/pre>&lt;p>The function for extracting information about sentence-level and document-level sentiments is shown in the next code snippet.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;sentences&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sentence&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">sentence&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sentiment&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">score&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">sentence&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sentences&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;document_sentiment&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">document_sentiment&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">score&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">extractSentiments&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">(&lt;/span>&lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">AnnotateTextResponse&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TextSentiments&lt;/span>&lt;span class="o">&amp;gt;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">annotateTextResponse&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TextSentiments&lt;/span> &lt;span class="n">sentiments&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TextSentiments&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sentiments&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setDocumentSentiment&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">annotateTextResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getDocumentSentiment&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getMagnitude&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Float&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">sentenceSentimentsMap&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">annotateTextResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getSentencesList&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">stream&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">collect&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Collectors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toMap&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">Sentence&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getContent&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">Sentence&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getSentiment&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getMagnitude&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sentiments&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setSentenceSentiments&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">sentenceSentimentsMap&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">sentiments&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">};&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The snippet loops over &lt;code>sentences&lt;/code> and, for each sentence, extracts the sentiment score.&lt;/p>
&lt;p>The output is:&lt;/p>
&lt;pre tabindex="0">&lt;code>{&amp;#34;sentences&amp;#34;: [{&amp;#34;My experience so far has been fantastic!&amp;#34;: 0.8999999761581421}, {&amp;#34;I&amp;#39;d really recommend this product.&amp;#34;: 0.8999999761581421}], &amp;#34;document_sentiment&amp;#34;: 0.8999999761581421}
&lt;/code>&lt;/pre>&lt;h3 id="extracting-entities">Extracting entities&lt;/h3>
&lt;p>The next function inspects the response for entities and returns the names and the types of those entities.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">return&lt;/span> &lt;span class="p">[{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;name&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">entity&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;type&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">nlp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">enums&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Entity&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">entity&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">type&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">entity&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">entities&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">extractEntities&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">(&lt;/span>&lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">AnnotateTextResponse&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">annotateTextResponse&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">annotateTextResponse&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getEntitiesList&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">stream&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">collect&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Collectors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toMap&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Entity&lt;/span>&lt;span class="o">::&lt;/span>&lt;span class="n">getName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">Entity&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getType&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">toString&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Entities can be found in &lt;code>entities&lt;/code> attribute. Just like before, &lt;code>entities&lt;/code> is a sequence, that&amp;rsquo;s why list comprehension is a viable choice. The most tricky part is interpreting the types of entities. Natural Language API defines entity types as enum. In a response object, entity types are returned as integers. That&amp;rsquo;s why a user has to instantiate &lt;code>naturallanguageml.enums.Entity.Type&lt;/code> to access a human-readable name.&lt;/p>
&lt;p>The output is:&lt;/p>
&lt;pre tabindex="0">&lt;code>[{&amp;#34;name&amp;#34;: &amp;#34;experience&amp;#34;, &amp;#34;type&amp;#34;: &amp;#34;OTHER&amp;#34;}, {&amp;#34;name&amp;#34;: &amp;#34;product&amp;#34;, &amp;#34;type&amp;#34;: &amp;#34;CONSUMER_GOOD&amp;#34;}]
&lt;/code>&lt;/pre>&lt;h3 id="accessing-sentence-dependency-tree">Accessing sentence dependency tree&lt;/h3>
&lt;p>The following code loops over the sentences and, for each sentence, builds an adjacency list that represents a dependency tree. For more information on what dependency tree is, see &lt;a href="https://cloud.google.com/natural-language/docs/morphology#dependency_trees">Morphology &amp;amp; Dependency Trees&lt;/a>.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">collections&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">defaultdict&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">adjacency_lists&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">sentence&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">sentences&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacency_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">defaultdict&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">list&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sentence_begin&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sentence&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">begin_offset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sentence_end&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sentence_begin&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sentence&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="n">index&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tokens&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="ow">and&lt;/span> \
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tokens&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">begin_offset&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">sentence_end&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">token&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tokens&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">head_token_index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">token&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">dependency_edge&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">head_token_index&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">head_token_text&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tokens&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">head_token_index&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">content&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacency_list&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">head_token_text&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">token&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">content&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">index&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacency_lists&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">adjacency_list&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">analyzeDependencyTree&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">AnnotateTextResponse&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&amp;gt;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">response&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">adjacencyLists&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ArrayList&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">index&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">Sentence&lt;/span> &lt;span class="n">s&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getSentencesList&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">adjacencyMap&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">HashMap&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">sentenceBegin&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getBeginOffset&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">sentenceEnd&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sentenceBegin&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getContent&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">1&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">index&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTokensCount&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTokens&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getBeginOffset&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">sentenceEnd&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Token&lt;/span> &lt;span class="n">token&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTokensList&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">index&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">headTokenIndex&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">token&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getDependencyEdge&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getHeadTokenIndex&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">headTokenContent&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTokens&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">headTokenIndex&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getContent&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">adjacencyList&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacencyMap&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOrDefault&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">headTokenContent&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">ArrayList&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacencyList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">token&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getText&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getContent&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacencyMap&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">put&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">headTokenContent&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">adjacencyList&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">index&lt;/span>&lt;span class="o">++;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">adjacencyLists&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">adjacencyMap&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">adjacencyLists&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">};&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The output is below. For better readability, indexes are replaced by text which they refer to:&lt;/p>
&lt;pre tabindex="0">&lt;code>[
{
&amp;#34;experience&amp;#34;: [
&amp;#34;My&amp;#34;
],
&amp;#34;been&amp;#34;: [
&amp;#34;experience&amp;#34;,
&amp;#34;far&amp;#34;,
&amp;#34;has&amp;#34;,
&amp;#34;been&amp;#34;,
&amp;#34;fantastic&amp;#34;,
&amp;#34;!&amp;#34;
],
&amp;#34;far&amp;#34;: [
&amp;#34;so&amp;#34;
]
},
{
&amp;#34;recommend&amp;#34;: [
&amp;#34;I&amp;#34;,
&amp;#34;&amp;#39;d&amp;#34;,
&amp;#34;really&amp;#34;,
&amp;#34;recommend&amp;#34;,
&amp;#34;product&amp;#34;,
&amp;#34;.&amp;#34;
],
&amp;#34;product&amp;#34;: [
&amp;#34;this&amp;#34;
]
}
]
&lt;/code>&lt;/pre>&lt;h2 id="getting-predictions">Getting predictions&lt;/h2>
&lt;p>This section shows how to use &lt;a href="https://cloud.google.com/ai-platform/prediction/docs/overview">Google Cloud AI Platform Prediction&lt;/a> to make predictions about new data from a cloud-hosted machine learning model.&lt;/p>
&lt;p>&lt;a href="https://github.com/tensorflow/tfx-bsl">tfx_bsl&lt;/a> is a library with a Beam PTransform called &lt;code>RunInference&lt;/code>. &lt;code>RunInference&lt;/code> is able to perform an inference that can use an external service endpoint for receiving data. When using a service endpoint, the transform takes a PCollection of type &lt;code>tf.train.Example&lt;/code> and, for every batch of elements, sends a request to AI Platform Prediction. The size of a batch is automatically computed. For more details on how Beam finds the best batch size, refer to a docstring for &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html?highlight=batchelements#apache_beam.transforms.util.BatchElements">BatchElements&lt;/a>. Currently, the transform does not support using &lt;code>tf.train.SequenceExample&lt;/code> as input, but the work is in progress.&lt;/p>
&lt;p>The transform produces a PCollection of type &lt;code>PredictionLog&lt;/code>, which contains predictions.&lt;/p>
&lt;p>Before getting started, deploy a TensorFlow model to AI Platform Prediction. The cloud service manages the infrastructure needed to handle prediction requests in both efficient and scalable way. Do note that only TensorFlow models are supported by the transform. For more information, see &lt;a href="https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction">Exporting a SavedModel for prediction&lt;/a>.&lt;/p>
&lt;p>Once a machine learning model is deployed, prepare a list of instances to get predictions for. To send binary data, make sure that the name of an input ends in &lt;code>_bytes&lt;/code>. This will base64-encode data before sending a request.&lt;/p>
&lt;h3 id="example">Example&lt;/h3>
&lt;p>Here is an example of a pipeline that reads input instances from the file, converts JSON objects to &lt;code>tf.train.Example&lt;/code> objects and sends data to AI Platform Prediction. The content of a file can look like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>{&amp;#34;input&amp;#34;: &amp;#34;the quick brown&amp;#34;}
{&amp;#34;input&amp;#34;: &amp;#34;la bruja le&amp;#34;}
&lt;/code>&lt;/pre>&lt;p>The example creates &lt;code>tf.train.BytesList&lt;/code> instances, thus it expects byte-like strings as input. However, other data types, like &lt;code>tf.train.FloatList&lt;/code> and &lt;code>tf.train.Int64List&lt;/code>, are also supported by the transform.&lt;/p>
&lt;p>Here is the code:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">json&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">tensorflow&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">tf&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tfx_bsl.beam.run_inference&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">RunInference&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tfx_bsl.proto&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">model_spec_pb2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">convert_json_to_tf_example&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json_obj&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">samples&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">json&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">loads&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json_obj&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">text&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">samples&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">items&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Feature&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bytes_list&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BytesList&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">value&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">encode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;utf-8&amp;#39;&lt;/span>&lt;span class="p">)]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Example&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">features&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Features&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">feature&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">feature&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;gs://my-bucket/samples.json&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">convert_json_to_tf_example&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">RunInference&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_spec_pb2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">InferenceEndpoint&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_endpoint_spec&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">model_spec_pb2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AIPlatformPredictionModelSpec&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">project_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;my-project-id&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;my-model-name&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">version_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;my-model-version&amp;#39;&lt;/span>&lt;span class="p">))))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Getting predictions is not yet available for Java. [https://github.com/apache/beam/issues/20001]
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Anomaly Detection</title><link>/documentation/ml/anomaly-detection/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/ml/anomaly-detection/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="anomaly-detection-example">Anomaly Detection Example&lt;/h1>
&lt;p>The anomaly detection example demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time, and then detects anomalies using a trained HDBSCAN clustering model.&lt;/p>
&lt;h2 id="dataset-for-anomaly-detection">Dataset for Anomaly Detection&lt;/h2>
&lt;p>This example uses a dataset called &lt;a href="https://huggingface.co/datasets/emotion">emotion&lt;/a> that contains 20,000 English Twitter messages with 6 basic emotions: anger, fear, joy, love, sadness, and surprise. The dataset has three splits: train (for training), validation, and test (for performance evaluation). Because it contains the text and the category (class) of the dataset, it is a supervised dataset. You can use the &lt;a href="https://huggingface.co/docs/datasets/index">Hugging Face datasets page&lt;/a> to access this dataset.&lt;/p>
&lt;p>The following text shows examples from the train split of the dataset:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">Text&lt;/th>
&lt;th style="text-align:center">Type of emotion&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">im grabbing a minute to post i feel greedy wrong&lt;/td>
&lt;td style="text-align:center">Anger&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">i am ever feeling nostalgic about the fireplace i will know that it is still on the property&lt;/td>
&lt;td style="text-align:center">Love&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny&lt;/td>
&lt;td style="text-align:center">Fear&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">on a boat trip to denmark&lt;/td>
&lt;td style="text-align:center">Joy&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">i feel you know basically like a fake in the realm of science fiction&lt;/td>
&lt;td style="text-align:center">Sadness&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">i began having them several times a week feeling tortured by the hallucinations moving people and figures sounds and vibrations&lt;/td>
&lt;td style="text-align:center">Fear&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="anomaly-detection-algorithm">Anomaly Detection Algorithm&lt;/h2>
&lt;p>&lt;a href="https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html">HDBSCAN&lt;/a> is a clustering algorithm that extends DBSCAN by converting it into a hierarchical clustering algorithm and then extracting a flat clustering based in the stability of clusters. When trained, the model predicts &lt;code>-1&lt;/code> if a new data point is an outlier, otherwise it predicts one of the existing clusters.&lt;/p>
&lt;h2 id="ingestion-to-pubsub">Ingestion to Pub/Sub&lt;/h2>
&lt;p>Ingest the data into &lt;a href="https://cloud.google.com/pubsub/docs/overview">Pub/Sub&lt;/a> so that while clustering, the model can read the tweets from Pub/Sub. Pub/Sub is a messaging service for exchanging event data among applications and services. Streaming analytics and data integration pipelines use Pub/Sub to ingest and distribute data.&lt;/p>
&lt;p>You can see the full example code for ingesting data into Pub/Sub in &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/anomaly_detection/write_data_to_pubsub_pipeline/">GitHub&lt;/a>&lt;/p>
&lt;p>The file structure for the ingestion pipeline is shown in the following diagram:&lt;/p>
&lt;pre>&lt;code>write_data_to_pubsub_pipeline/
├── pipeline/
│ ├── __init__.py
│ ├── options.py
│ └── utils.py
├── __init__.py
├── config.py
├── main.py
└── setup.py
&lt;/code>&lt;/pre>
&lt;p>&lt;code>pipeline/utils.py&lt;/code> contains the code for loading the emotion dataset and two &lt;code>beam.DoFn&lt;/code> that are used for data transformation.&lt;/p>
&lt;p>&lt;code>pipeline/options.py&lt;/code> contains the pipeline options to configure the Dataflow pipeline.&lt;/p>
&lt;p>&lt;code>config.py&lt;/code> defines variables that are used multiple times, like Google Cloud PROJECT_ID and NUM_WORKERS.&lt;/p>
&lt;p>&lt;code>setup.py&lt;/code> defines the packages and requirements for the pipeline to run.&lt;/p>
&lt;p>&lt;code>main.py&lt;/code> contains the pipeline code and additional functions used for running the pipeline.&lt;/p>
&lt;h3 id="run-the-pipeline">Run the Pipeline&lt;/h3>
&lt;p>To run the pipeline, install the required packages.For this example, you need access to a Google Cloud project, and you need to configure the Google Cloud variables, like &lt;code>PROJECT_ID&lt;/code>, &lt;code>REGION&lt;/code>, &lt;code>PubSub TOPIC_ID&lt;/code>, and others in the &lt;code>config.py&lt;/code> file.&lt;/p>
&lt;ol>
&lt;li>Locally on your machine: &lt;code>python main.py&lt;/code>&lt;/li>
&lt;li>On GCP for Dataflow: &lt;code>python main.py --mode cloud&lt;/code>&lt;/li>
&lt;/ol>
&lt;p>The &lt;code>write_data_to_pubsub_pipeline&lt;/code> contains four different transforms:&lt;/p>
&lt;ol>
&lt;li>Load the emotion dataset using Hugging Face datasets (for simplicity, we take samples from three classes instead of six).&lt;/li>
&lt;li>Associate each piece of text with a unique identifier (UID).&lt;/li>
&lt;li>Convert the text into the format that Pub/Sub expects.&lt;/li>
&lt;li>Write the formatted message to Pub/Sub.&lt;/li>
&lt;/ol>
&lt;h2 id="anomaly-detection-on-streaming-data">Anomaly Detection on Streaming Data&lt;/h2>
&lt;p>After ingesting the data to Pub/Sub, run the anomaly detection pipeline. This pipeline reads the streaming message from Pub/Sub, converts the text to an embedding using a language model, and feeds the embedding to an already trained clustering model to predict whether the message is an anomaly. One prerequisite for this pipeline is to have an HDBSCAN clustering model trained on the training split of the dataset.&lt;/p>
&lt;p>You can find the full example code for anomaly detection in &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/anomaly_detection/anomaly_detection_pipeline/">GitHub&lt;/a>&lt;/p>
&lt;p>The following diagram shows the file structure for the anomaly_detection pipeline:&lt;/p>
&lt;pre>&lt;code>anomaly_detection_pipeline/
├── pipeline/
│ ├── __init__.py
│ ├── options.py
│ └── transformations.py
├── __init__.py
├── config.py
├── main.py
└── setup.py
&lt;/code>&lt;/pre>
&lt;p>&lt;code>pipeline/transformations.py&lt;/code> contains the code for different &lt;code>beam.DoFn&lt;/code> and additional functions that are used in pipeline.&lt;/p>
&lt;p>&lt;code>pipeline/options.py&lt;/code> contains the pipeline options to configure the Dataflow pipeline.&lt;/p>
&lt;p>&lt;code>config.py&lt;/code> defines variables that are used multiple times, like the Google Cloud PROJECT_ID and NUM_WORKERS.&lt;/p>
&lt;p>&lt;code>setup.py&lt;/code> defines the packages and requirements for the pipeline to run.&lt;/p>
&lt;p>&lt;code>main.py&lt;/code> contains the pipeline code and additional functions used to run the pipeline.&lt;/p>
&lt;h3 id="run-the-pipeline-1">Run the Pipeline&lt;/h3>
&lt;p>Install the required packages and push the data to Pub/Sub. For this example, you need access to a Google Cloud project, and you need to configure the Google Cloud variables, like &lt;code>PROJECT_ID&lt;/code>, &lt;code>REGION&lt;/code>, &lt;code>PubSub SUBSCRIPTION_ID&lt;/code>, and others in the &lt;code>config.py&lt;/code> file.&lt;/p>
&lt;ol>
&lt;li>Locally on your machine: &lt;code>python main.py&lt;/code>&lt;/li>
&lt;li>On GCP for Dataflow: &lt;code>python main.py --mode cloud&lt;/code>&lt;/li>
&lt;/ol>
&lt;p>The pipeline includes the following steps:&lt;/p>
&lt;ol>
&lt;li>Read the message from Pub/Sub.&lt;/li>
&lt;li>Convert the Pub/Sub message into a &lt;code>PCollection&lt;/code> of dictionaries where the key is the UID and the value is the Twitter text.&lt;/li>
&lt;li>Encode the text into transformer-readable token ID integers using a tokenizer.&lt;/li>
&lt;li>Use RunInference to get the vector embedding from a transformer-based language model.&lt;/li>
&lt;li>Normalize the embedding.&lt;/li>
&lt;li>Use RunInference to get anomaly prediction from a trained HDBSCAN clustering model.&lt;/li>
&lt;li>Write the prediction to BigQuery so that the clustering model can be retrained when needed.&lt;/li>
&lt;li>Send an email alert if an anomaly is detected.&lt;/li>
&lt;/ol>
&lt;p>The following code snippet shows the first two steps of the pipeline:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> docs = (
pipeline
| &amp;#34;Read from PubSub&amp;#34;
&amp;gt;&amp;gt; ReadFromPubSub(subscription=cfg.SUBSCRIPTION_ID, with_attributes=True)
| &amp;#34;Decode PubSubMessage&amp;#34; &amp;gt;&amp;gt; beam.ParDo(Decode())
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The next section describes the following pipeline steps:&lt;/p>
&lt;ul>
&lt;li>Tokenizing the text&lt;/li>
&lt;li>Getting embedding using RunInference&lt;/li>
&lt;li>Getting predictions from the HDBSCAN model&lt;/li>
&lt;/ul>
&lt;h3 id="get-embedding-from-a-language-model">Get Embedding from a Language Model&lt;/h3>
&lt;p>In order to do clustering with text data, first map the text into vectors of numerical values suitable for statistical analysis. This example uses a transformer-based language model called &lt;a href="https://huggingface.co/sentence-transformers/stsb-distilbert-base">sentence-transformers/stsb-distilbert-base/stsb-distilbert-base&lt;/a>. This model maps sentences and paragraphs to a 768 dimensional dense vector space, and you can use it for tasks like clustering or semantic search.&lt;/p>
&lt;p>Because the language model is expecting a tokenized input instead of raw text, start by tokenizing the text. Tokenization is a preprocessing task that transforms text so that it can be fed into the model for getting predictions.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> normalized_embedding = (
docs
| &amp;#34;Tokenize Text&amp;#34; &amp;gt;&amp;gt; beam.Map(tokenize_sentence)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Here, &lt;code>tokenize_sentence&lt;/code> is a function that takes a dictionary with a text and an ID, tokenizes the text, and returns a tuple of the text and ID as well as the tokenized output.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Tokenizer = AutoTokenizer.from_pretrained(cfg.TOKENIZER_NAME)
def tokenize_sentence(input_dict):
&amp;#34;&amp;#34;&amp;#34;
Takes a dictionary with a text and an id, tokenizes the text, and
returns a tuple of the text and id and the tokenized text
Args:
input_dict: a dictionary with the text and id of the sentence
Returns:
A tuple of the text and id, and a dictionary of the tokens.
&amp;#34;&amp;#34;&amp;#34;
text, uid = input_dict[&amp;#34;text&amp;#34;], input_dict[&amp;#34;id&amp;#34;]
tokens = Tokenizer([text], padding=True, truncation=True, return_tensors=&amp;#34;pt&amp;#34;)
tokens = {key: torch.squeeze(val) for key, val in tokens.items()}
return (text, uid), tokens&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Tokenized output is then passed to the language model to get the embeddings. To get embeddings from the language model, we use &lt;code>RunInference()&lt;/code> from Apache Beam.&lt;/p>
&lt;p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> | &amp;#34;Get Embedding&amp;#34; &amp;gt;&amp;gt; RunInference(KeyedModelHandler(embedding_model_handler))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
where &lt;code>embedding_model_handler&lt;/code> is:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> embedding_model_handler = PytorchNoBatchModelHandler(
state_dict_path=cfg.MODEL_STATE_DICT_PATH,
model_class=ModelWrapper,
model_params={&amp;#34;config&amp;#34;: AutoConfig.from_pretrained(cfg.MODEL_CONFIG_PATH)},
device=&amp;#34;cpu&amp;#34;,
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>We define &lt;code>PytorchNoBatchModelHandler&lt;/code> as a wrapper to &lt;code>PytorchModelHandler&lt;/code> to limit batch size to one.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code># Can be removed once: https://github.com/apache/beam/issues/21863 is fixed
class PytorchNoBatchModelHandler(PytorchModelHandlerKeyedTensor):
&amp;#34;&amp;#34;&amp;#34;Wrapper to PytorchModelHandler to limit batch size to 1.
The tokenized strings generated from BertTokenizer may have different
lengths, which doesn&amp;#39;t work with torch.stack() in current RunInference
implementation since stack() requires tensors to be the same size.
Restricting max_batch_size to 1 means there is only 1 example per `batch`
in the run_inference() call.
&amp;#34;&amp;#34;&amp;#34;
def batch_elements_kwargs(self):
return {&amp;#34;max_batch_size&amp;#34;: 1}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Because the &lt;code>forward()&lt;/code> for &lt;code>DistilBertModel&lt;/code> doesn&amp;rsquo;t return the embeddings, we custom define the model_class &lt;code>ModelWrapper&lt;/code> to get the vector embedding.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>class ModelWrapper(DistilBertModel):
&amp;#34;&amp;#34;&amp;#34;Wrapper to DistilBertModel to get embeddings when calling
forward function.&amp;#34;&amp;#34;&amp;#34;
def forward(self, **kwargs):
output = super().forward(**kwargs)
sentence_embedding = (
self.mean_pooling(output,
kwargs[&amp;#34;attention_mask&amp;#34;]).detach().cpu().numpy())
return sentence_embedding
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(self, model_output, attention_mask):
&amp;#34;&amp;#34;&amp;#34;
Calculates the mean of token embeddings
Args:
model_output: The output of the model.
attention_mask: This is a tensor that contains 1s for all input tokens and
0s for all padding tokens.
Returns:
The mean of the token embeddings.
&amp;#34;&amp;#34;&amp;#34;
token_embeddings = model_output[
0] # First element of model_output contains all token embeddings
input_mask_expanded = (
attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float())
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
input_mask_expanded.sum(1), min=1e-9)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>After getting the embedding for each piece of Twitter text, the embeddings are normalized, because the trained model is expecting normalized embeddings.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> | &amp;#34;Normalize Embedding&amp;#34; &amp;gt;&amp;gt; beam.ParDo(NormalizeEmbedding())&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h3 id="get-predictions">Get Predictions&lt;/h3>
&lt;p>The normalized embeddings are then forwarded to the trained HDBSCAN model to get the predictions.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> predictions = (
normalized_embedding
| &amp;#34;Get Prediction from Clustering Model&amp;#34;
&amp;gt;&amp;gt; RunInference(model_handler=clustering_model_handler)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>where &lt;code>clustering_model_handler&lt;/code> is:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> clustering_model_handler = KeyedModelHandler(
CustomSklearnModelHandlerNumpy(
model_uri=cfg.CLUSTERING_MODEL_PATH, model_file_type=ModelFileType.JOBLIB
)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>We define &lt;code>CustomSklearnModelHandlerNumpy&lt;/code> as a wrapper to &lt;code>SklearnModelHandlerNumpy&lt;/code> to limit batch size to one and to override &lt;code>run_inference&lt;/code> so that &lt;code>hdbscan.approximate_predict()&lt;/code> is used to get anomaly predictions.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>class CustomSklearnModelHandlerNumpy(SklearnModelHandlerNumpy):
# limit batch size to 1 can be removed once: https://github.com/apache/beam/issues/21863 is fixed
def batch_elements_kwargs(self):
&amp;#34;&amp;#34;&amp;#34;Limit batch size to 1 for inference&amp;#34;&amp;#34;&amp;#34;
return {&amp;#34;max_batch_size&amp;#34;: 1}
# run_inference can be removed once: https://github.com/apache/beam/issues/22572 is fixed
def run_inference(self, batch, model, inference_args=None):
&amp;#34;&amp;#34;&amp;#34;Runs inferences on a batch of numpy arrays.
Args:
batch: A sequence of examples as numpy arrays. They should
be single examples.
model: A numpy model or pipeline. Must implement predict(X).
Where the parameter X is a numpy array.
inference_args: Any additional arguments for an inference.
Returns:
An Iterable of type PredictionResult.
&amp;#34;&amp;#34;&amp;#34;
_validate_inference_args(inference_args)
vectorized_batch = np.vstack(batch)
predictions = hdbscan.approximate_predict(model, vectorized_batch)
return [PredictionResult(x, y) for x, y in zip(batch, predictions)]&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>After getting the model predictions, decode the output from &lt;code>RunInference&lt;/code> into a dictionary. Next, store the prediction in a BigQuery table for analysis, update the HDBSCAN model, and send an email alert if the prediction is an anomaly.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> _ = (
predictions
| &amp;#34;Decode Prediction&amp;#34; &amp;gt;&amp;gt; beam.ParDo(DecodePrediction())
| &amp;#34;Write to BQ&amp;#34; &amp;gt;&amp;gt; beam.io.WriteToBigQuery(
table=cfg.TABLE_URI,
schema=cfg.TABLE_SCHEMA,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
))
_ = predictions | &amp;#34;Alert by Email&amp;#34; &amp;gt;&amp;gt; beam.ParDo(TriggerEmailAlert())&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Apache Beam: Developing I/O connectors for Java</title><link>/documentation/io/developing-io-java/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/developing-io-java/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="developing-io-connectors-for-java">Developing I/O connectors for Java&lt;/h1>
&lt;p>&lt;strong>IMPORTANT:&lt;/strong> Use &lt;code>Splittable DoFn&lt;/code> to develop your new I/O. For more details, read the
&lt;a href="/documentation/io/developing-io-overview/">new I/O connector overview&lt;/a>.&lt;/p>
&lt;p>To connect to a data store that isn’t supported by Beam’s existing I/O
connectors, you must create a custom I/O connector that usually consist of a
source and a sink. All Beam sources and sinks are composite transforms; however,
the implementation of your custom I/O depends on your use case. Before you
start, read the
&lt;a href="/documentation/io/developing-io-overview/">new I/O connector overview&lt;/a>
for an overview of developing a new I/O connector, the available implementation
options, and how to choose the right option for your use case.&lt;/p>
&lt;p>This guide covers using the &lt;code>Source&lt;/code> and &lt;code>FileBasedSink&lt;/code> interfaces using Java.
The Python SDK offers the same functionality, but uses a slightly different API.
See &lt;a href="/documentation/io/developing-io-python/">Developing I/O connectors for Python&lt;/a>
for information specific to the Python SDK.&lt;/p>
&lt;h2 id="basic-code-reqs">Basic code requirements&lt;/h2>
&lt;p>Beam runners use the classes you provide to read and/or write data using
multiple worker instances in parallel. As such, the code you provide for
&lt;code>Source&lt;/code> and &lt;code>FileBasedSink&lt;/code> subclasses must meet some basic requirements:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Serializability:&lt;/strong> Your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code> subclass, whether
bounded or unbounded, must be Serializable. A runner might create multiple
instances of your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code> subclass to be sent to
multiple remote workers to facilitate reading or writing in parallel.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Immutability:&lt;/strong>
Your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code> subclass must be effectively immutable.
All private fields must be declared final, and all private variables of
collection type must be effectively immutable. If your class has setter
methods, those methods must return an independent copy of the object with
the relevant field modified.&lt;/p>
&lt;p>You should only use mutable state in your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code>
subclass if you are using lazy evaluation of expensive computations that
you need to implement the source or sink; in that case, you must declare
all mutable instance variables transient.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Thread-Safety:&lt;/strong> Your code must be thread-safe. If you build your source
to work with dynamic work rebalancing, it is critical that you make your
code thread-safe. The Beam SDK provides a helper class to make this easier.
See &lt;a href="#bounded-dynamic">Using Your BoundedSource with dynamic work rebalancing&lt;/a>
for more details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Testability:&lt;/strong> It is critical to exhaustively unit test all of your
&lt;code>Source&lt;/code> and &lt;code>FileBasedSink&lt;/code> subclasses, especially if you build your
classes to work with advanced features such as dynamic work rebalancing. A
minor implementation error can lead to data corruption or data loss (such
as skipping or duplicating records) that can be hard to detect.&lt;/p>
&lt;p>To assist in testing &lt;code>BoundedSource&lt;/code> implementations, you can use the
SourceTestUtils class. &lt;code>SourceTestUtils&lt;/code> contains utilities for automatically
verifying some of the properties of your &lt;code>BoundedSource&lt;/code> implementation. You
can use &lt;code>SourceTestUtils&lt;/code> to increase your implementation&amp;rsquo;s test coverage
using a wide range of inputs with relatively few lines of code. For
examples that use &lt;code>SourceTestUtils&lt;/code>, see the
&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/extensions/avro/src/test/java/org/apache/beam/sdk/extensions/avro/io/AvroSourceTest.java">AvroSourceTest&lt;/a> and
&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java">TextIOReadTest&lt;/a>
source code.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>In addition, see the &lt;a href="/contribute/ptransform-style-guide/">PTransform style guide&lt;/a>
for Beam&amp;rsquo;s transform style guidance.&lt;/p>
&lt;h2 id="implementing-the-source-interface">Implementing the Source interface&lt;/h2>
&lt;p>To create a data source for your pipeline, you must provide the format-specific
logic that tells a runner how to read data from your input source, and how to
split your data source into multiple parts so that multiple worker instances can
read your data in parallel. If you&amp;rsquo;re creating a data source that reads
unbounded data, you must provide additional logic for managing your source&amp;rsquo;s
watermark and optional checkpointing.&lt;/p>
&lt;p>Supply the logic for your source by creating the following classes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>A subclass of &lt;code>BoundedSource&lt;/code> if you want to read a finite (batch) data set,
or a subclass of &lt;code>UnboundedSource&lt;/code> if you want to read an infinite (streaming)
data set. These subclasses describe the data you want to read, including the
data&amp;rsquo;s location and parameters (such as how much data to read).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A subclass of &lt;code>Source.Reader&lt;/code>. Each Source must have an associated Reader that
captures all the state involved in reading from that &lt;code>Source&lt;/code>. This can
include things like file handles, RPC connections, and other parameters that
depend on the specific requirements of the data format you want to read.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The &lt;code>Reader&lt;/code> class hierarchy mirrors the Source hierarchy. If you&amp;rsquo;re extending
&lt;code>BoundedSource&lt;/code>, you&amp;rsquo;ll need to provide an associated &lt;code>BoundedReader&lt;/code>. if you&amp;rsquo;re
extending &lt;code>UnboundedSource&lt;/code>, you&amp;rsquo;ll need to provide an associated
&lt;code>UnboundedReader&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>One or more user-facing wrapper composite transforms (&lt;code>PTransform&lt;/code>) that
wrap read operations. &lt;a href="#ptransform-wrappers">PTransform wrappers&lt;/a> discusses
why you should avoid exposing your sources.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="implementing-the-source-subclass">Implementing the Source subclass&lt;/h3>
&lt;p>You must create a subclass of either &lt;code>BoundedSource&lt;/code> or &lt;code>UnboundedSource&lt;/code>,
depending on whether your data is a finite batch or an infinite stream. In
either case, your &lt;code>Source&lt;/code> subclass must override the abstract methods in the
superclass. A runner might call these methods when using your data source. For
example, when reading from a bounded source, a runner uses these methods to
estimate the size of your data set and to split it up for parallel reading.&lt;/p>
&lt;p>Your &lt;code>Source&lt;/code> subclass should also manage basic information about your data
source, such as the location. For example, the example &lt;code>Source&lt;/code> implementation
in Beam’s &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/gcp/datastore/DatastoreIO.html">DatastoreIO&lt;/a>
class takes host, datasetID, and query as arguments. The connector uses these
values to obtain data from Cloud Datastore.&lt;/p>
&lt;h4 id="boundedsource">BoundedSource&lt;/h4>
&lt;p>&lt;code>BoundedSource&lt;/code> represents a finite data set from which a Beam runner may read,
possibly in parallel. &lt;code>BoundedSource&lt;/code> contains a set of abstract methods that
the runner uses to split the data set for reading by multiple workers.&lt;/p>
&lt;p>To implement a &lt;code>BoundedSource&lt;/code>, your subclass must override the following
abstract methods:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>split&lt;/code>: The runner uses this method to split your finite data
into bundles of a given size.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getEstimatedSizeBytes&lt;/code>: The runner uses this method to estimate the total
size of your data, in bytes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>createReader&lt;/code>: Creates the associated &lt;code>BoundedReader&lt;/code> for this
&lt;code>BoundedSource&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>You can see a model of how to implement &lt;code>BoundedSource&lt;/code> and the required
abstract methods in Beam’s implementations for Cloud BigTable
(&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java">BigtableIO.java&lt;/a>)
and BigQuery (&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java">BigQuerySourceBase.java&lt;/a>).&lt;/p>
&lt;h4 id="unboundedsource">UnboundedSource&lt;/h4>
&lt;p>&lt;code>UnboundedSource&lt;/code> represents an infinite data stream from which the runner may
read, possibly in parallel. &lt;code>UnboundedSource&lt;/code> contains a set of abstract methods
that the runner uses to support streaming reads in parallel; these include
&lt;em>checkpointing&lt;/em> for failure recovery, &lt;em>record IDs&lt;/em> to prevent data duplication,
and &lt;em>watermarking&lt;/em> for estimating data completeness in downstream parts of your
pipeline.&lt;/p>
&lt;p>To implement an &lt;code>UnboundedSource&lt;/code>, your subclass must override the following
abstract methods:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>split&lt;/code>: The runner uses this method to generate a list of
&lt;code>UnboundedSource&lt;/code> objects which represent the number of sub-stream instances
from which the service should read in parallel.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getCheckpointMarkCoder&lt;/code>: The runner uses this method to obtain the Coder for
the checkpoints for your source (if any).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>requiresDeduping&lt;/code>: The runner uses this method to determine whether the data
requires explicit removal of duplicate records. If this method returns true,
the runner will automatically insert a step to remove duplicates from your
source&amp;rsquo;s output. This should return true if and only if your source
provides record IDs for each record. See &lt;code>UnboundedReader.getCurrentRecordId&lt;/code>
for when this should be done.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>createReader&lt;/code>: Creates the associated &lt;code>UnboundedReader&lt;/code> for this
&lt;code>UnboundedSource&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="implementing-the-reader-subclass">Implementing the Reader subclass&lt;/h3>
&lt;p>You must create a subclass of either &lt;code>BoundedReader&lt;/code> or &lt;code>UnboundedReader&lt;/code> to be
returned by your source subclass&amp;rsquo;s &lt;code>createReader&lt;/code> method. The runner uses the
methods in your &lt;code>Reader&lt;/code> (whether bounded or unbounded) to do the actual reading
of your dataset.&lt;/p>
&lt;p>&lt;code>BoundedReader&lt;/code> and &lt;code>UnboundedReader&lt;/code> have similar basic interfaces, which
you&amp;rsquo;ll need to define. In addition, there are some additional methods unique to
&lt;code>UnboundedReader&lt;/code> that you&amp;rsquo;ll need to implement for working with unbounded data,
and an optional method you can implement if you want your &lt;code>BoundedReader&lt;/code> to
take advantage of dynamic work rebalancing. There are also minor differences in
the semantics for the &lt;code>start()&lt;/code> and &lt;code>advance()&lt;/code> methods when using
&lt;code>UnboundedReader&lt;/code>.&lt;/p>
&lt;h4 id="reader-methods-common-to-both-boundedreader-and-unboundedreader">Reader methods common to both BoundedReader and UnboundedReader&lt;/h4>
&lt;p>A runner uses the following methods to read data using &lt;code>BoundedReader&lt;/code> or
&lt;code>UnboundedReader&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>start&lt;/code>: Initializes the &lt;code>Reader&lt;/code> and advances to the first record to be read.
This method is called exactly once when the runner begins reading your data,
and is a good place to put expensive operations needed for initialization.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>advance&lt;/code>: Advances the reader to the next valid record. This method must
return false if there is no more input available. &lt;code>BoundedReader&lt;/code> should stop
reading once advance returns false, but &lt;code>UnboundedReader&lt;/code> can return true in
future calls once more data is available from your stream.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getCurrent&lt;/code>: Returns the data record at the current position, last read by
start or advance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getCurrentTimestamp&lt;/code>: Returns the timestamp for the current data record. You
only need to override &lt;code>getCurrentTimestamp&lt;/code> if your source reads data that has
intrinsic timestamps. The runner uses this value to set the intrinsic
timestamp for each element in the resulting output &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="reader-methods-unique-to-unboundedreader">Reader methods unique to UnboundedReader&lt;/h4>
&lt;p>In addition to the basic &lt;code>Reader&lt;/code> interface, &lt;code>UnboundedReader&lt;/code> has some
additional methods for managing reads from an unbounded data source:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>getCurrentRecordId&lt;/code>: Returns a unique identifier for the current record.
The runner uses these record IDs to filter out duplicate records. If your
data has logical IDs present in each record, you can have this method return
them; otherwise, you can return a hash of the record contents, using at
least a 128-bit hash. It is incorrect to use Java&amp;rsquo;s &lt;code>Object.hashCode()&lt;/code>, as
a 32-bit hash is generally insufficient for preventing collisions, and
&lt;code>hasCode()&lt;/code> is not guaranteed to be stable across processes.&lt;/p>
&lt;p>Implementing &lt;code>getCurrentRecordId&lt;/code> is optional if your source uses a
checkpointing scheme that uniquely identifies each record. For example, if
your splits are files and the checkpoints are file positions up to which all
data has been read, you do not need record IDs. However, record IDs can
still be useful if upstream systems writing data to your source occasionally
produce duplicate records that your source might then read.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getWatermark&lt;/code>: Returns a watermark that your &lt;code>Reader&lt;/code> provides. The watermark
is the approximate lower bound on timestamps of future elements to be read
by your &lt;code>Reader&lt;/code>. The runner uses the watermark as an estimate of data
completeness. Watermarks are used in windowing and triggers.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getCheckpointMark&lt;/code>: The runner uses this method to create a checkpoint in
your data stream. The checkpoint represents the progress of the
&lt;code>UnboundedReader&lt;/code>, which can be used for failure recovery. Different data
streams may use different checkpointing methods; some sources might require
received records to be acknowledged, while others might use positional
checkpointing. You&amp;rsquo;ll need to tailor this method to the most appropriate
checkpointing scheme. For example, you might have this method return the
most recently acked record(s).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>getCheckpointMark&lt;/code> is optional; you don&amp;rsquo;t need to implement it if your data
does not have meaningful checkpoints. However, if you choose not to
implement checkpointing in your source, you may encounter duplicate data or
data loss in your pipeline, depending on whether your data source tries to
re-send records in case of errors.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>You can read a bounded &lt;code>PCollection&lt;/code> from an &lt;code>UnboundedSource&lt;/code> by specifying
either &lt;code>.withMaxNumRecords&lt;/code> or &lt;code>.withMaxReadTime&lt;/code> when you read from your
source. &lt;code>.withMaxNumRecords&lt;/code> reads a fixed maximum number of records from your
unbounded source, while &lt;code>.withMaxReadTime&lt;/code> reads from your unbounded source for
a fixed maximum time duration.&lt;/p>
&lt;h4 id="bounded-dynamic">Using your BoundedSource with dynamic work rebalancing&lt;/h4>
&lt;p>If your source provides bounded data, you can have your &lt;code>BoundedReader&lt;/code> work
with dynamic work rebalancing by implementing the method &lt;code>splitAtFraction&lt;/code>. The
runner may call &lt;code>splitAtFraction&lt;/code> concurrently with start or advance on a given
reader so that the remaining data in your &lt;code>Source&lt;/code> can be split and
redistributed to other workers.&lt;/p>
&lt;p>When you implement &lt;code>splitAtFraction&lt;/code>, your code must produce a
mutually-exclusive set of splits where the union of those splits matches the
total data set.&lt;/p>
&lt;p>If you implement &lt;code>splitAtFraction&lt;/code>, you must implement both &lt;code>splitAtFraction&lt;/code>
and &lt;code>getFractionConsumed&lt;/code> in a thread-safe manner, or data loss is possible. You
should also unit-test your implementation exhaustively to avoid data duplication
or data loss.&lt;/p>
&lt;p>To ensure that your code is thread-safe, use the &lt;code>RangeTracker&lt;/code> thread-safe
helper object to manage positions in your data source when implementing
&lt;code>splitAtFraction&lt;/code> and &lt;code>getFractionConsumed&lt;/code>.&lt;/p>
&lt;p>We highly recommended that you unit test your implementations of
&lt;code>splitAtFraction&lt;/code> using the &lt;code>SourceTestUtils&lt;/code> class. &lt;code>SourceTestUtils&lt;/code> contains
a number of methods for testing your implementation of &lt;code>splitAtFraction&lt;/code>,
including exhaustive automatic testing.&lt;/p>
&lt;h3 id="convenience-source-and-reader-base-classes">Convenience Source and Reader base classes&lt;/h3>
&lt;p>The Beam SDK contains some convenient abstract base classes to help you create
&lt;code>Source&lt;/code> and &lt;code>Reader&lt;/code> classes that work with common data storage formats, like
files.&lt;/p>
&lt;h4 id="filebasedsource">FileBasedSource&lt;/h4>
&lt;p>If your data source uses files, you can derive your &lt;code>Source&lt;/code> and &lt;code>Reader&lt;/code>
classes from the &lt;code>FileBasedSource&lt;/code> and &lt;code>FileBasedReader&lt;/code> abstract base classes.
&lt;code>FileBasedSource&lt;/code> is a bounded source subclass that implements code common to
Beam sources that interact with files, including:&lt;/p>
&lt;ul>
&lt;li>File pattern expansion&lt;/li>
&lt;li>Sequential record reading&lt;/li>
&lt;li>Split points&lt;/li>
&lt;/ul>
&lt;h2 id="using-filebasedsink">Using the FileBasedSink abstraction&lt;/h2>
&lt;p>If your data source uses files, you can implement the &lt;code>FileBasedSink&lt;/code>
abstraction to create a file-based sink. For other sinks, use &lt;code>ParDo&lt;/code>,
&lt;code>GroupByKey&lt;/code>, and other transforms offered by the Beam SDK for Java. See the
&lt;a href="/documentation/io/developing-io-overview/">developing I/O connectors overview&lt;/a>
for more details.&lt;/p>
&lt;p>When using the &lt;code>FileBasedSink&lt;/code> interface, you must provide the format-specific
logic that tells the runner how to write bounded data from your pipeline&amp;rsquo;s
&lt;code>PCollection&lt;/code>s to an output sink. The runner writes bundles of data in parallel
using multiple workers.&lt;/p>
&lt;p>Supply the logic for your file-based sink by implementing the following classes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>A subclass of the abstract base class &lt;code>FileBasedSink&lt;/code>. &lt;code>FileBasedSink&lt;/code>
describes a location or resource that your pipeline can write to in
parallel. To avoid exposing your sink to end-users, your &lt;code>FileBasedSink&lt;/code>
subclass should be protected or private.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A user-facing wrapper &lt;code>PTransform&lt;/code> that, as part of the logic, calls
&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java">WriteFiles&lt;/a>
and passes your &lt;code>FileBasedSink&lt;/code> as a parameter. A user should not need to
call &lt;code>WriteFiles&lt;/code> directly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>FileBasedSink&lt;/code> abstract base class implements code that is common to Beam
sinks that interact with files, including:&lt;/p>
&lt;ul>
&lt;li>Setting file headers and footers&lt;/li>
&lt;li>Sequential record writing&lt;/li>
&lt;li>Setting the output MIME type&lt;/li>
&lt;/ul>
&lt;p>&lt;code>FileBasedSink&lt;/code> and its subclasses support writing files to any Beam-supported
&lt;code>FileSystem&lt;/code> implementations. See the following Beam-provided &lt;code>FileBasedSink&lt;/code>
implementations for examples:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java">TextSink&lt;/a> and&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/io/AvroSink.java">AvroSink&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2 id="ptransform-wrappers">PTransform wrappers&lt;/h2>
&lt;p>When you create a source or sink that end-users will use, avoid exposing your
source or sink code. To avoid exposing your sources and sinks to end-users, your
new classes should be protected or private. Then, implement a user-facing
wrapper &lt;code>PTransform&lt;/code>. By exposing your source or sink as a transform, your
implementation is hidden and can be arbitrarily complex or simple. The greatest
benefit of not exposing implementation details is that later on, you can add
additional functionality without breaking the existing implementation for users.&lt;/p>
&lt;p>For example, if your users’ pipelines read from your source using
&lt;code>read&lt;/code> and you want to insert a reshard into the pipeline, all
users would need to add the reshard themselves (using the &lt;code>GroupByKey&lt;/code>
transform). To solve this, we recommended that you expose the source as a
composite &lt;code>PTransform&lt;/code> that performs both the read operation and the reshard.&lt;/p>
&lt;p>See Beam’s &lt;a href="/contribute/ptransform-style-guide/#exposing-a-ptransform-vs-something-else">PTransform style guide&lt;/a>
for additional information about wrapping with a &lt;code>PTransform&lt;/code>.&lt;/p></description></item><item><title>Documentation: Apache Beam: Developing I/O connectors for Python</title><link>/documentation/io/developing-io-python/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/developing-io-python/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="developing-io-connectors-for-python">Developing I/O connectors for Python&lt;/h1>
&lt;p>&lt;strong>IMPORTANT:&lt;/strong> Please use &lt;code>Splittable DoFn&lt;/code> to develop your new I/O. For more details, please read
the &lt;a href="/documentation/io/developing-io-overview/">new I/O connector overview&lt;/a>.&lt;/p>
&lt;p>To connect to a data store that isn’t supported by Beam’s existing I/O
connectors, you must create a custom I/O connector that usually consist of a
source and a sink. All Beam sources and sinks are composite transforms; however,
the implementation of your custom I/O depends on your use case. Before you
start, read the &lt;a href="/documentation/io/developing-io-overview/">new I/O connector overview&lt;/a>
for an overview of developing a new I/O connector, the available implementation
options, and how to choose the right option for your use case.&lt;/p>
&lt;p>This guide covers using the &lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.io.iobase.html">Source and FileBasedSink interfaces&lt;/a>
for Python. The Java SDK offers the same functionality, but uses a slightly
different API. See &lt;a href="/documentation/io/developing-io-java/">Developing I/O connectors for Java&lt;/a>
for information specific to the Java SDK.&lt;/p>
&lt;h2 id="basic-code-reqs">Basic code requirements&lt;/h2>
&lt;p>Beam runners use the classes you provide to read and/or write data using
multiple worker instances in parallel. As such, the code you provide for
&lt;code>Source&lt;/code> and &lt;code>FileBasedSink&lt;/code> subclasses must meet some basic requirements:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Serializability:&lt;/strong> Your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code> subclass must be
serializable. The service may create multiple instances of your &lt;code>Source&lt;/code>
or &lt;code>FileBasedSink&lt;/code> subclass to be sent to multiple remote workers to
facilitate reading or writing in parallel. The &lt;em>way&lt;/em> the source and sink
objects are serialized is runner specific.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Immutability:&lt;/strong> Your &lt;code>Source&lt;/code> or &lt;code>FileBasedSink&lt;/code> subclass must be
effectively immutable. You should only use mutable state in your &lt;code>Source&lt;/code>
or &lt;code>FileBasedSink&lt;/code> subclass if you are using lazy evaluation of expensive
computations that you need to implement the source.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Thread-Safety:&lt;/strong> Your code must be thread-safe. The Beam SDK for Python
provides the &lt;code>RangeTracker&lt;/code> class to make this easier.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Testability:&lt;/strong> It is critical to exhaustively unit-test all of your
&lt;code>Source&lt;/code> and &lt;code>FileBasedSink&lt;/code> subclasses. A minor implementation error can
lead to data corruption or data loss (such as skipping or duplicating
records) that can be hard to detect. You can use test harnesses and utility
methods available in the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py">source_test_utils module&lt;/a>
to develop tests for your source.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>In addition, see the &lt;a href="/contribute/ptransform-style-guide/">PTransform style guide&lt;/a>
for Beam&amp;rsquo;s transform style guidance.&lt;/p>
&lt;h2 id="implementing-the-source-interface">Implementing the Source interface&lt;/h2>
&lt;p>To create a new data source for your pipeline, you&amp;rsquo;ll need to provide the format-specific logic that tells the service how to read data from your input source, and how to split your data source into multiple parts so that multiple worker instances can read your data in parallel.&lt;/p>
&lt;p>Supply the logic for your new source by creating the following classes:&lt;/p>
&lt;ul>
&lt;li>A subclass of &lt;code>BoundedSource&lt;/code>. &lt;code>BoundedSource&lt;/code> is a source that reads a
finite amount of input records. The class describes the data you want to
read, including the data&amp;rsquo;s location and parameters (such as how much data to
read).&lt;/li>
&lt;li>A subclass of &lt;code>RangeTracker&lt;/code>. &lt;code>RangeTracker&lt;/code> is a thread-safe object used to
manage a range for a given position type.&lt;/li>
&lt;li>One or more user-facing wrapper composite transforms (&lt;code>PTransform&lt;/code>) that
wrap read operations. &lt;a href="#ptransform-wrappers">PTransform wrappers&lt;/a> discusses
why you should avoid exposing your sources, and walks through how to create
a wrapper.&lt;/li>
&lt;/ul>
&lt;p>You can find these classes in the
&lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.io.iobase.html">apache_beam.io.iobase module&lt;/a>.&lt;/p>
&lt;h3 id="implementing-the-boundedsource-subclass">Implementing the BoundedSource subclass&lt;/h3>
&lt;p>&lt;code>BoundedSource&lt;/code> represents a finite data set from which the service reads, possibly in parallel. &lt;code>BoundedSource&lt;/code> contains a set of methods that the service uses to split the data set for reading by multiple remote workers.&lt;/p>
&lt;p>To implement a &lt;code>BoundedSource&lt;/code>, your subclass must override the following methods:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>estimate_size&lt;/code>: Services use this method to estimate the &lt;em>total size&lt;/em> of your data, in bytes. This estimate is in terms of external storage size, before performing decompression or other processing.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>split&lt;/code>: Service use this method to split your finite data into bundles of a given size.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>get_range_tracker&lt;/code>: Services use this method to get the &lt;code>RangeTracker&lt;/code> for a given position range, and use the information to report progress and perform dynamic splitting of sources.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>read&lt;/code>: This method returns an iterator that reads data from the source, with respect to the boundaries defined by the given &lt;code>RangeTracker&lt;/code> object.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="implementing-the-rangetracker-subclass">Implementing the RangeTracker subclass&lt;/h3>
&lt;p>A &lt;code>RangeTracker&lt;/code> is a thread-safe object used to manage the current range and current position of the reader of a &lt;code>BoundedSource&lt;/code> and protect concurrent access to them.&lt;/p>
&lt;p>To implement a &lt;code>RangeTracker&lt;/code>, you should first familiarize yourself with the following definitions:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Position-based sources&lt;/strong> - A position-based source can be described by a range of positions of an ordered type, and the records read by the source can be described by positions of that type. For example, for a record within a file, the position can be the starting byte offset of the record. The position type for the record in this case is &lt;code>long&lt;/code>.&lt;/p>
&lt;p>The main requirement for position-based sources is &lt;strong>associativity&lt;/strong>: Reading records in position range &amp;lsquo;[A, B)&amp;rsquo; and records in position range &amp;lsquo;[B, C)&amp;rsquo; should give the same records as reading records in position range &amp;lsquo;[A, C)&amp;rsquo;, where &amp;lsquo;A&amp;rsquo; &amp;lt;= &amp;lsquo;B&amp;rsquo; &amp;lt;= &amp;lsquo;C&amp;rsquo;. This property ensures that no matter how many arbitrary sub-ranges a range of positions is split into, the total set of records they describe stays the same.&lt;/p>
&lt;p>The other important property is how the source&amp;rsquo;s range relates to positions of records in the source. In many sources each record can be identified by a unique starting position. In this case:&lt;/p>
&lt;ul>
&lt;li>All records returned by a source &amp;lsquo;[A, B)&amp;rsquo; must have starting positions in this range.&lt;/li>
&lt;li>All but the last record should end within this range. The last record may or may not extend past the end of the range.&lt;/li>
&lt;li>Records must not overlap.&lt;/li>
&lt;/ul>
&lt;p>Such sources should define &amp;ldquo;read &amp;lsquo;[A, B)&amp;rsquo;&amp;rdquo; as &amp;ldquo;read from the first record starting at or after &amp;lsquo;A&amp;rsquo;, up to but not including the first record starting at or after &amp;lsquo;B&amp;rsquo;&amp;rdquo;.&lt;/p>
&lt;p>Some examples of such sources include reading lines or CSV from a text file, reading keys and values from a database, etc.&lt;/p>
&lt;p>The concept of &lt;em>split points&lt;/em> allows to extend the definitions for dealing with sources where some records cannot be identified by a unique starting position.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Split points&lt;/strong> - A split point describes a record that is the first one returned when reading the range from and including position &lt;strong>A&lt;/strong> up to infinity (i.e. [A, infinity)).&lt;/p>
&lt;p>Some sources may have records that are not directly addressable. For example, imagine a file format consisting of a sequence of compressed blocks. Each block can be assigned an offset, but records within the block cannot be directly addressed without decompressing the block. Let us refer to this hypothetical format as &lt;em>CBF (Compressed Blocks Format)&lt;/em>.&lt;/p>
&lt;p>Many such formats can still satisfy the associativity property. For example, in CBF, reading [A, B) can mean &amp;ldquo;read all the records in all blocks whose starting offset is in [A, B)&amp;rdquo;.&lt;/p>
&lt;p>To support such complex formats, Beam introduces the notion of &lt;em>split points&lt;/em>. A record is a split point if there exists a position &lt;strong>A&lt;/strong> such that the record is the first one to be returned when reading the range [A, infinity). In CBF, the only split points would be the first records in each block.&lt;/p>
&lt;p>Split points allow us to define the meaning of a record&amp;rsquo;s position and a source&amp;rsquo;s range in the following cases:&lt;/p>
&lt;ul>
&lt;li>For a record that is at a split point, its position is defined to be the largest &lt;strong>A&lt;/strong> such that reading a source with the range [A, infinity) returns this record.&lt;/li>
&lt;li>Positions of other records are only required to be non-decreasing.&lt;/li>
&lt;li>Reading the source [A, B) must return records starting from the first split point at or after &lt;strong>A&lt;/strong>, up to but not including the first split point at or after &lt;strong>B&lt;/strong>. In particular, this means that the first record returned by a source MUST always be a split point.&lt;/li>
&lt;li>Positions of split points must be unique.&lt;/li>
&lt;/ul>
&lt;p>As a result, for any decomposition of the full range of the source into position ranges, the total set of records will be the full set of records in the source, and each record will be read exactly once.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Consumed positions&lt;/strong> - Consumed positions refer to records that have been read.&lt;/p>
&lt;p>As the source is being read, and records read from it are being passed to the downstream transforms in the pipeline, we say that positions in the source are being &lt;em>consumed&lt;/em>. When a reader has read a record (or promised to a caller that a record will be returned), positions up to and including the record&amp;rsquo;s start position are considered &lt;em>consumed&lt;/em>.&lt;/p>
&lt;p>Dynamic splitting can happen only at &lt;em>unconsumed&lt;/em> positions. If the reader just returned a record at offset 42 in a file, dynamic splitting can happen only at offset 43 or beyond. Otherwise, that record could be read twice (by the current reader and the reader of the new task).&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="rangetracker-methods">RangeTracker methods&lt;/h4>
&lt;p>To implement a &lt;code>RangeTracker&lt;/code>, your subclass must override the following methods:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>start_position&lt;/code>: Returns the starting position of the current range, inclusive.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>stop_position&lt;/code>: Returns the ending position of the current range, exclusive.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>try_claim&lt;/code>: This method is used to determine if a record at a split point is within the range. This method should modify the internal state of the &lt;code>RangeTracker&lt;/code> by updating the last-consumed position to the given starting &lt;code>position&lt;/code> of the record being read by the source. The method returns true if the given position falls within the current range.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>set_current_position&lt;/code>: This method updates the last-consumed position to the given starting position of a record being read by a source. You can invoke this method for records that do not start at split points, and this should modify the internal state of the &lt;code>RangeTracker&lt;/code>. If the record starts at a split point, you must invoke &lt;code>try_claim&lt;/code> instead of this method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>position_at_fraction&lt;/code>: Given a fraction within the range [0.0, 1.0), this method will return the position at the given fraction compared to the position range [&lt;code>self.start_position&lt;/code>, &lt;code>self.stop_position&lt;/code>).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>try_split&lt;/code>: This method attempts to split the current range into two parts around a suggested position. It is allowed to split at a different position, but in most cases it will split at the suggested position.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>This method splits the current range [&lt;code>self.start_position&lt;/code>, &lt;code>self.stop_position&lt;/code>) into a &amp;ldquo;primary&amp;rdquo; part [&lt;code>self.start_position&lt;/code>, &lt;code>split_position&lt;/code>), and a &amp;ldquo;residual&amp;rdquo; part [&lt;code>split_position&lt;/code>, &lt;code>self.stop_position&lt;/code>), assuming that &lt;code>split_position&lt;/code> has not been consumed yet.&lt;/p>
&lt;p>If &lt;code>split_position&lt;/code> has already been consumed, the method returns &lt;code>None&lt;/code>. Otherwise, it updates the current range to be the primary and returns a tuple (&lt;code>split_position&lt;/code>, &lt;code>split_fraction&lt;/code>). &lt;code>split_fraction&lt;/code> should be the fraction of size of range [&lt;code>self.start_position&lt;/code>, &lt;code>split_position&lt;/code>) compared to the original (before split) range [&lt;code>self.start_position&lt;/code>, &lt;code>self.stop_position&lt;/code>).&lt;/p>
&lt;ul>
&lt;li>&lt;code>fraction_consumed&lt;/code>: Returns the approximate fraction of consumed positions in the source.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note:&lt;/strong> Methods of class &lt;code>iobase.RangeTracker&lt;/code> may be invoked by multiple threads, hence this class must be made thread-safe, for example, by using a single lock object.&lt;/p>
&lt;h3 id="convenience-source-base-classes">Convenience Source base classes&lt;/h3>
&lt;p>The Beam SDK for Python contains some convenient abstract base classes to help you easily create new sources.&lt;/p>
&lt;h4 id="filebasedsource">FileBasedSource&lt;/h4>
&lt;p>&lt;code>FileBasedSource&lt;/code> is a framework for developing sources for new file types. You can derive your &lt;code>BoundedSource&lt;/code> class from the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsource.py">FileBasedSource&lt;/a> class.&lt;/p>
&lt;p>To create a source for a new file type, you need to create a sub-class of &lt;code>FileBasedSource&lt;/code>. Sub-classes of &lt;code>FileBasedSource&lt;/code> must implement the method &lt;code>FileBasedSource.read_records()&lt;/code>.&lt;/p>
&lt;p>See &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">AvroSource&lt;/a> for an example implementation of &lt;code>FileBasedSource&lt;/code>.&lt;/p>
&lt;h3 id="reading-from-a-new-source">Reading from a new Source&lt;/h3>
&lt;p>The following example, &lt;code>CountingSource&lt;/code>, demonstrates an implementation of &lt;code>BoundedSource&lt;/code> and uses the SDK-provided &lt;code>RangeTracker&lt;/code> called &lt;code>OffsetRangeTracker&lt;/code>.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>class CountingSource(iobase.BoundedSource):
def __init__(self, count):
self.records_read = Metrics.counter(self.__class__, &amp;#39;recordsRead&amp;#39;)
self._count = count
def estimate_size(self):
return self._count
def get_range_tracker(self, start_position, stop_position):
if start_position is None:
start_position = 0
if stop_position is None:
stop_position = self._count
return OffsetRangeTracker(start_position, stop_position)
def read(self, range_tracker):
for i in range(range_tracker.start_position(),
range_tracker.stop_position()):
if not range_tracker.try_claim(i):
return
self.records_read.inc()
yield i
def split(self, desired_bundle_size, start_position=None, stop_position=None):
if start_position is None:
start_position = 0
if stop_position is None:
stop_position = self._count
bundle_start = start_position
while bundle_start &amp;lt; stop_position:
bundle_stop = min(stop_position, bundle_start + desired_bundle_size)
yield iobase.SourceBundle(
weight=(bundle_stop - bundle_start),
source=self,
start_position=bundle_start,
stop_position=bundle_stop)
bundle_start = bundle_stop&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>To read data from the source in your pipeline, use the &lt;code>Read&lt;/code> transform:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>with beam.Pipeline() as pipeline:
numbers = pipeline | &amp;#39;ProduceNumbers&amp;#39; &amp;gt;&amp;gt; beam.io.Read(CountingSource(count))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Note:&lt;/strong> When you create a source that end-users are going to use, we
recommended that you do not expose the code for the source itself as
demonstrated in the example above. Use a wrapping &lt;code>PTransform&lt;/code> instead.
&lt;a href="#ptransform-wrappers">PTransform wrappers&lt;/a> discusses why you should avoid
exposing your sources, and walks through how to create a wrapper.&lt;/p>
&lt;h2 id="using-the-filebasedsink-abstraction">Using the FileBasedSink abstraction&lt;/h2>
&lt;p>If your data source uses files, you can implement the &lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.io.filebasedsink.html">FileBasedSink&lt;/a>
abstraction to create a file-based sink. For other sinks, use &lt;code>ParDo&lt;/code>,
&lt;code>GroupByKey&lt;/code>, and other transforms offered by the Beam SDK for Python. See the
&lt;a href="/documentation/io/developing-io-overview/">developing I/O connectors overview&lt;/a>
for more details.&lt;/p>
&lt;p>When using the &lt;code>FileBasedSink&lt;/code> interface, you must provide the format-specific
logic that tells the runner how to write bounded data from your pipeline&amp;rsquo;s
&lt;code>PCollection&lt;/code>s to an output sink. The runner writes bundles of data in parallel
using multiple workers.&lt;/p>
&lt;p>Supply the logic for your file-based sink by implementing the following classes:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>A subclass of the abstract base class &lt;code>FileBasedSink&lt;/code>. &lt;code>FileBasedSink&lt;/code>
describes a location or resource that your pipeline can write to in
parallel. To avoid exposing your sink to end-users, use the &lt;code>_&lt;/code> prefix when
creating your &lt;code>FileBasedSink&lt;/code> subclass.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A user-facing wrapper &lt;code>PTransform&lt;/code> that, as part of the logic, calls
&lt;code>Write&lt;/code> and passes your &lt;code>FileBasedSink&lt;/code> as a parameter. A user should not
need to call &lt;code>Write&lt;/code> directly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>FileBasedSink&lt;/code> abstract base class implements code that is common to Beam
sinks that interact with files, including:&lt;/p>
&lt;ul>
&lt;li>Setting file headers and footers&lt;/li>
&lt;li>Sequential record writing&lt;/li>
&lt;li>Setting the output MIME type&lt;/li>
&lt;/ul>
&lt;p>&lt;code>FileBasedSink&lt;/code> and its subclasses support writing files to any Beam-supported
&lt;code>FileSystem&lt;/code> implementations. See the following Beam-provided &lt;code>FileBasedSink&lt;/code>
implementation for an example:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">TextSink&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="ptransform-wrappers">PTransform wrappers&lt;/h2>
&lt;p>When you create a source or sink that end-users will use, avoid exposing your
source or sink code. To avoid exposing your sources and sinks to end-users, your
new classes should use the &lt;code>_&lt;/code> prefix. Then, implement a user-facing
wrapper &lt;code>PTransform&lt;/code>.`By exposing your source or sink as a transform, your
implementation is hidden and can be arbitrarily complex or simple. The greatest
benefit of not exposing implementation details is that later on, you can add
additional functionality without breaking the existing implementation for users.&lt;/p>
&lt;p>For example, if your users’ pipelines read from your source using
&lt;code>beam.io.Read&lt;/code> and you want to insert a reshard into the pipeline, all
users would need to add the reshard themselves (using the &lt;code>GroupByKey&lt;/code>
transform). To solve this, we recommended that you expose the source as a
composite &lt;code>PTransform&lt;/code> that performs both the read operation and the reshard.&lt;/p>
&lt;p>See Beam’s &lt;a href="/contribute/ptransform-style-guide/#exposing-a-ptransform-vs-something-else">PTransform style guide&lt;/a>
for additional information about wrapping with a &lt;code>PTransform&lt;/code>.&lt;/p>
&lt;p>The following examples change the source and sink from the above sections so
that they are not exposed to end-users. For the source, rename &lt;code>CountingSource&lt;/code>
to &lt;code>_CountingSource&lt;/code>. Then, create the wrapper &lt;code>PTransform&lt;/code>, called
&lt;code>ReadFromCountingSource&lt;/code>:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>class ReadFromCountingSource(PTransform):
def __init__(self, count):
super().__init__()
self._count = count
def expand(self, pcoll):
return pcoll | iobase.Read(_CountingSource(self._count))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Finally, read from the source:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>with beam.Pipeline() as pipeline:
numbers = pipeline | &amp;#39;ProduceNumbers&amp;#39; &amp;gt;&amp;gt; ReadFromCountingSource(count)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>For the sink, rename &lt;code>SimpleKVSink&lt;/code> to &lt;code>_SimpleKVSink&lt;/code>. Then, create the wrapper &lt;code>PTransform&lt;/code>, called &lt;code>WriteToKVSink&lt;/code>:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>class WriteToKVSink(PTransform):
def __init__(self, simplekv, url, final_table_name):
self._simplekv = simplekv
super().__init__()
self._url = url
self._final_table_name = final_table_name
def expand(self, pcoll):
return pcoll | iobase.Write(
_SimpleKVSink(self._simplekv, self._url, self._final_table_name))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Finally, write to the sink:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>with beam.Pipeline(options=PipelineOptions()) as pipeline:
kvs = pipeline | &amp;#39;CreateKVs&amp;#39; &amp;gt;&amp;gt; beam.core.Create(KVs)
kvs | &amp;#39;WriteToSimpleKV&amp;#39; &amp;gt;&amp;gt; WriteToKVSink(
simplekv, &amp;#39;http://url_to_simple_kv/&amp;#39;, final_table_name)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Apache Hadoop Input/Output Format IO</title><link>/documentation/io/built-in/hadoop/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/hadoop/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="hadoop-inputoutput-format-io">Hadoop Input/Output Format IO&lt;/h1>
&lt;blockquote>
&lt;p>&lt;strong>IMPORTANT!&lt;/strong> Previous implementation of Hadoop Input Format IO, called &lt;code>HadoopInputFormatIO&lt;/code>, is deprecated starting from &lt;em>Apache Beam 2.10&lt;/em>. Please, use current &lt;code>HadoopFormatIO&lt;/code> which supports both &lt;code>InputFormat&lt;/code> and &lt;code>OutputFormat&lt;/code>.&lt;/p>
&lt;/blockquote>
&lt;p>A &lt;code>HadoopFormatIO&lt;/code> is a transform for reading data from any source or writing data to any sink that implements Hadoop&amp;rsquo;s &lt;code>InputFormat&lt;/code> or &lt;code>OutputFormat&lt;/code> accordingly. For example, Cassandra, Elasticsearch, HBase, Redis, Postgres, etc.&lt;/p>
&lt;p>&lt;code>HadoopFormatIO&lt;/code> allows you to connect to many data sources/sinks that do not yet have a Beam IO transform. However, &lt;code>HadoopFormatIO&lt;/code> has to make several performance trade-offs in connecting to &lt;code>InputFormat&lt;/code> or &lt;code>OutputFormat&lt;/code>. So, if there is another Beam IO transform for connecting specifically to your data source/sink of choice, we recommend you use that one.&lt;/p>
&lt;h3 id="reading-using-hadoopformatio">Reading using HadoopFormatIO&lt;/h3>
&lt;p>You will need to pass a Hadoop &lt;code>Configuration&lt;/code> with parameters specifying how the read will occur. Many properties of the &lt;code>Configuration&lt;/code> are optional and some are required for certain &lt;code>InputFormat&lt;/code> classes, but the following properties must be set for all &lt;code>InputFormat&lt;/code> classes:&lt;/p>
&lt;ul>
&lt;li>&lt;code>mapreduce.job.inputformat.class&lt;/code> - The &lt;code>InputFormat&lt;/code> class used to connect to your data source of choice.&lt;/li>
&lt;li>&lt;code>key.class&lt;/code> - The &lt;code>Key&lt;/code> class returned by the &lt;code>InputFormat&lt;/code> in &lt;code>mapreduce.job.inputformat.class&lt;/code>.&lt;/li>
&lt;li>&lt;code>value.class&lt;/code> - The &lt;code>Value&lt;/code> class returned by the &lt;code>InputFormat&lt;/code> in &lt;code>mapreduce.job.inputformat.class&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">myHadoopConfiguration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">false&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Set Hadoop InputFormat, key and value class in configuration
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.inputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormatClass&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">InputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormatValueClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>You will need to check if the &lt;code>Key&lt;/code> and &lt;code>Value&lt;/code> classes output by the &lt;code>InputFormat&lt;/code> have a Beam &lt;code>Coder&lt;/code> available. If not, you can use &lt;code>withKeyTranslation&lt;/code> or &lt;code>withValueTranslation&lt;/code> to specify a method transforming instances of those classes into another class that is supported by a Beam &lt;code>Coder&lt;/code>. These settings are optional and you don&amp;rsquo;t need to specify translation for both key and value.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyKeyClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">myOutputKeyType&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">new&lt;/span> &lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyKeyClass&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">MyKeyClass&lt;/span> &lt;span class="nf">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">InputFormatKeyClass&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...logic to transform InputFormatKeyClass to MyKeyClass
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">InputFormatValueClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyValueClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">myOutputValueType&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">new&lt;/span> &lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">InputFormatValueClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyValueClass&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">MyValueClass&lt;/span> &lt;span class="nf">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">InputFormatValueClass&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...logic to transform InputFormatValueClass to MyValueClass
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">};&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h4 id="read-data-only-with-hadoop-configuration">Read data only with Hadoop configuration.&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="read-data-with-configuration-and-key-translation">Read data with configuration and key translation&lt;/h4>
&lt;p>For example, a Beam &lt;code>Coder&lt;/code> is not available for &lt;code>Key&lt;/code> class, so key translation is required.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">MyKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyTranslation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myOutputKeyType&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="read-data-with-configuration-and-value-translation">Read data with configuration and value translation&lt;/h4>
&lt;p>For example, a Beam &lt;code>Coder&lt;/code> is not available for &lt;code>Value&lt;/code> class, so value translation is required.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">InputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyValueClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueTranslation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myOutputValueType&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="read-data-with-configuration-value-translation-and-key-translation">Read data with configuration, value translation and key translation&lt;/h4>
&lt;p>For example, Beam Coders are not available for both &lt;code>Key&lt;/code> class and &lt;code>Value&lt;/code> classes of &lt;code>InputFormat&lt;/code>, so key and value translation are required.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">MyKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MyValueClass&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyTranslation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myOutputKeyType&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueTranslation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myOutputValueType&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h1 id="examples-for-specific-inputformats">Examples for specific InputFormats&lt;/h1>
&lt;h3 id="cassandra---cqlinputformat">Cassandra - CqlInputFormat&lt;/h3>
&lt;p>To read data from Cassandra, use &lt;code>org.apache.cassandra.hadoop.cql3.CqlInputFormat&lt;/code>, which needs the following properties to be set:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">cassandraConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;cassandra.input.thrift.port&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;9160&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;cassandra.input.thrift.address&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CassandraHostIp&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;cassandra.input.partitioner.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;Murmur3Partitioner&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;cassandra.input.keyspace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;myKeySpace&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;cassandra.input.columnfamily&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;myColumnFamily&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">java&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">lang&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Long&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">com&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">datastax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">driver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">core&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Row&lt;/span> &lt;span class="n">Row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.inputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">cassandra&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">cql3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">CqlInputFormat&lt;/span> &lt;span class="n">CqlInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Call Read transform as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">cassandraData&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">cassandraConf&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueTranslation&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">cassandraOutputValueType&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The &lt;code>CqlInputFormat&lt;/code> key class is &lt;code>java.lang.Long&lt;/code> &lt;code>Long&lt;/code>, which has a Beam &lt;code>Coder&lt;/code>. The &lt;code>CqlInputFormat&lt;/code> value class is &lt;code>com.datastax.driver.core.Row&lt;/code> &lt;code>Row&lt;/code>, which does not have a Beam &lt;code>Coder&lt;/code>. Rather than write a new coder, you can provide your own translation method, as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">cassandraOutputValueType&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">SimpleFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Row&lt;/span> &lt;span class="n">row&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getString&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="err">&amp;#39;&lt;/span>&lt;span class="n">myColName&lt;/span>&lt;span class="err">&amp;#39;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">};&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="elasticsearch---esinputformat">Elasticsearch - EsInputFormat&lt;/h3>
&lt;p>To read data from Elasticsearch, use &lt;code>EsInputFormat&lt;/code>, which needs following properties to be set:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">elasticsearchConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;es.nodes&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ElasticsearchHostIp&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;es.port&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;9200&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;es.resource&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;ElasticIndexName/ElasticTypeName&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Text&lt;/span> &lt;span class="n">Text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">elasticsearch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">mr&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">LinkedMapWritable&lt;/span> &lt;span class="n">LinkedMapWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.inputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">elasticsearch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">mr&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">EsInputFormat&lt;/span> &lt;span class="n">EsInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Call Read transform as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LinkedMapWritable&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">elasticData&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LinkedMapWritable&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">elasticsearchConf&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The &lt;code>org.elasticsearch.hadoop.mr.EsInputFormat&lt;/code>&amp;rsquo;s &lt;code>EsInputFormat&lt;/code> key class is &lt;code>org.apache.hadoop.io.Text&lt;/code> &lt;code>Text&lt;/code>, and its value class is &lt;code>org.elasticsearch.hadoop.mr.LinkedMapWritable&lt;/code> &lt;code>LinkedMapWritable&lt;/code>. Both key and value classes have Beam Coders.&lt;/p>
&lt;h3 id="hcatalog---hcatinputformat">HCatalog - HCatInputFormat&lt;/h3>
&lt;p>To read data using HCatalog, use &lt;code>org.apache.hive.hcatalog.mapreduce.HCatInputFormat&lt;/code>, which needs the following properties to be set:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">hcatConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.inputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">HCatInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">HCatRecord&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;hive.metastore.uris&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;thrift://metastore-host:port&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hive&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hcatalog&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">mapreduce&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">HCatInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;my_database&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;my_table&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;my_filter&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Call Read transform as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">HCatRecord&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">hcatData&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">HCatRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">hcatConf&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="amazon-dynamodb---dynamodbinputformat">Amazon DynamoDB - DynamoDBInputFormat&lt;/h3>
&lt;p>To read data from Amazon DynamoDB, use &lt;code>org.apache.hadoop.dynamodb.read.DynamoDBInputFormat&lt;/code>.
DynamoDBInputFormat implements the older &lt;code>org.apache.hadoop.mapred.InputFormat&lt;/code> interface and to make it compatible with HadoopFormatIO which uses the newer abstract class &lt;code>org.apache.hadoop.mapreduce.InputFormat&lt;/code>,
a wrapper API is required which acts as an adapter between HadoopFormatIO and DynamoDBInputFormat (or in general any InputFormat implementing &lt;code>org.apache.hadoop.mapred.InputFormat&lt;/code>)
The below example uses one such available wrapper API - &lt;a href="https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java">https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java&lt;/a>&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">dynamoDBConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">Job&lt;/span> &lt;span class="n">job&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Job&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInstance&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">com&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">twitter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">elephantbird&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">mapreduce&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">input&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">MapReduceInputFormatWrapper&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setInputFormat&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">dynamodb&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">DynamoDBInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">job&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">job&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getConfiguration&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Text&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">WritableComparable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">hadoop&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">dynamodb&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">DynamoDBItemWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Writable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.servicename&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;dynamodb&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.input.tableName&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;table_name&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.endpoint&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;dynamodb.us-west-1.amazonaws.com&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.regionid&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;us-west-1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.throughput.read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.throughput.read.percent&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;dynamodb.version&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;2011-12-05&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">DynamoDBConstants&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">DYNAMODB_ACCESS_KEY_CONF&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;aws_access_key&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">DynamoDBConstants&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">DYNAMODB_SECRET_KEY_CONF&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;aws_secret_key&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Call Read transform as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">DynamoDBItemWritable&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">dynamoDBData&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">DynamoDBItemWritable&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">dynamoDBConf&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="apache-hbase---tablesnapshotinputformat">Apache HBase - TableSnapshotInputFormat&lt;/h3>
&lt;p>To read data from an HBase table snapshot, use &lt;code>org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat&lt;/code>.
Reading from a table snapshot bypasses the HBase region servers, instead reading HBase data files directly from the filesystem.
This is useful for cases such as reading historical data or offloading of work from the HBase cluster.
There are scenarios when this may prove faster than accessing content through the region servers using the &lt;code>HBaseIO&lt;/code>.&lt;/p>
&lt;p>A table snapshot can be taken using the HBase shell or programmatically:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="k">try&lt;/span> &lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Connection&lt;/span> &lt;span class="n">connection&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ConnectionFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createConnection&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Admin&lt;/span> &lt;span class="n">admin&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">connection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getAdmin&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">admin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">snapshot&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;my_snaphshot&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TableName&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">valueOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;my_table&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HBaseProtos&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SnapshotDescription&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Type&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">FLUSH&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>A &lt;code>TableSnapshotInputFormat&lt;/code> is configured as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Construct a typical HBase scan
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">Scan&lt;/span> &lt;span class="n">scan&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Scan&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">scan&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setCaching&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1000&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">scan&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setBatch&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1000&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">scan&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">addColumn&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Bytes&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;CF&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">Bytes&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;col_1&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">scan&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">addColumn&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Bytes&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;CF&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">Bytes&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;col_2&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">hbaseConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">HBaseConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HConstants&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ZOOKEEPER_QUORUM&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;zk1:2181&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;hbase.rootdir&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;/hbase&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;mapreduce.job.inputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TableSnapshotInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">InputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ImmutableBytesWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Writable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Result&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Writable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ClientProtos&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Scan&lt;/span> &lt;span class="n">proto&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ProtobufUtil&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toScan&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">scan&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TableInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SCAN&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Base64&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">encodeBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">proto&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toByteArray&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Make use of existing utility methods
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">Job&lt;/span> &lt;span class="n">job&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Job&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInstance&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">);&lt;/span> &lt;span class="c1">// creates internal clone of hbaseConf
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">TableSnapshotInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">job&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;my_snapshot&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Path&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;/tmp/snapshot_restore&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">hbaseConf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">job&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getConfiguration&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// extract the modified clone
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Call Read transform as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ImmutableBytesWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Result&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">hbaseSnapshotData&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">ImmutableBytesWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Result&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">hbaseConf&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="writing-using-hadoopformatio">Writing using HadoopFormatIO&lt;/h3>
&lt;p>You will need to pass a Hadoop &lt;code>Configuration&lt;/code> with parameters specifying how the write will occur. Many properties of the &lt;code>Configuration&lt;/code> are optional, and some are required for certain &lt;code>OutputFormat&lt;/code> classes, but the following properties must be set for all &lt;code>OutputFormat&lt;/code>s:&lt;/p>
&lt;ul>
&lt;li>&lt;code>mapreduce.job.id&lt;/code> - The identifier of the write job. E.g.: end timestamp of window.&lt;/li>
&lt;li>&lt;code>mapreduce.job.outputformat.class&lt;/code> - The &lt;code>OutputFormat&lt;/code> class used to connect to your data sink of choice.&lt;/li>
&lt;li>&lt;code>mapreduce.job.output.key.class&lt;/code> - The key class passed to the &lt;code>OutputFormat&lt;/code> in &lt;code>mapreduce.job.outputformat.class&lt;/code>.&lt;/li>
&lt;li>&lt;code>mapreduce.job.output.value.class&lt;/code> - The value class passed to the &lt;code>OutputFormat&lt;/code> in &lt;code>mapreduce.job.outputformat.class&lt;/code>.&lt;/li>
&lt;li>&lt;code>mapreduce.job.reduces&lt;/code> - Number of reduce tasks. Value is equal to number of write tasks which will be generated. This property is not required for &lt;code>Write.PartitionedWriterBuilder#withoutPartitioning()&lt;/code> write.&lt;/li>
&lt;li>&lt;code>mapreduce.job.partitioner.class&lt;/code> - Hadoop partitioner class which will be used for distributing of records among partitions. This property is not required for &lt;code>Write.PartitionedWriterBuilder#withoutPartitioning()&lt;/code> write.&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Note&lt;/em>: All mentioned values have appropriate constants. E.g.: &lt;code>HadoopFormatIO.OUTPUT_FORMAT_CLASS_ATTR&lt;/code>.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Configuration&lt;/span> &lt;span class="n">myHadoopConfiguration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Configuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">false&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Set Hadoop OutputFormat, key and value class in configuration
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.outputformat.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyDbOutputFormatClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.output.key.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyDbOutputFormatKeyClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.output.value.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyDbOutputFormatValueClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.partitioner.class&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyPartitionerClass&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">setInt&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;mapreduce.job.reduces&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">2&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>You will need to set &lt;code>OutputFormat&lt;/code> key and value class (i.e. &amp;ldquo;mapreduce.job.output.key.class&amp;rdquo; and &amp;ldquo;mapreduce.job.output.value.class&amp;rdquo;) in Hadoop &lt;code>Configuration&lt;/code> which are equal to &lt;code>KeyT&lt;/code> and &lt;code>ValueT&lt;/code>. If you set different &lt;code>OutputFormat&lt;/code> key or value class than &lt;code>OutputFormat&lt;/code>&amp;rsquo;s actual key or value class then, it will throw &lt;code>IllegalArgumentException&lt;/code>.&lt;/p>
&lt;h4 id="batch-writing">Batch writing&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Data which will we want to write
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">boundedWordsCount&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Hadoop configuration for write
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// We have partitioned write, so Partitioner and reducers count have to be set - see withPartitioning() javadoc
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">Configuration&lt;/span> &lt;span class="n">myHadoopConfiguration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Path to directory with locks
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">locksDirPath&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">boundedWordsCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;writeBatch&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfiguration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myHadoopConfiguration&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPartitioning&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withExternalSynchronization&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">HDFSSynchronization&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="stream-writing">Stream writing&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Data which will we want to write
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">unboundedWordsCount&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Transformation which transforms data of one window into one hadoop configuration
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;?&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;,&lt;/span> &lt;span class="n">PCollectionView&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Configuration&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">configTransform&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">unboundedWordsCount&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;writeStream&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HadoopFormatIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Text&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LongWritable&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfigurationTransform&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">configTransform&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withExternalSynchronization&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">HDFSSynchronization&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support Hadoop Input/Output Format IO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Apache HCatalog I/O connector</title><link>/documentation/io/built-in/hcatalog/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/hcatalog/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="hcatalog-io">HCatalog IO&lt;/h1>
&lt;p>An &lt;code>HCatalogIO&lt;/code> is a transform for reading and writing data to an HCatalog managed source.&lt;/p>
&lt;h3 id="reading-using-hcatalogio">Reading using HCatalogIO&lt;/h3>
&lt;p>To configure an HCatalog source, you must specify a metastore URI and a table name. Other optional parameters are database and filter.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">configProperties&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">HashMap&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">configProperties&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">put&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;hive.metastore.uris&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>&lt;span class="s">&amp;#34;thrift://metastore-host:port&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HCatalogIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfigProperties&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">configProperties&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withDatabase&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;default&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="c1">//optional, assumes default if none specified
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withTable&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;employee&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withFilter&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">filterString&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="c1">//optional, may be specified if the table is partitioned
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support HCatalogIO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="writing-using-hcatalogio">Writing using HCatalogIO&lt;/h3>
&lt;p>To configure an &lt;code>HCatalog&lt;/code> sink, you must specify a metastore URI and a table name. Other
optional parameters are database, partition and batchsize.
The destination table should exist beforehand as the transform will not create a new table if missing.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">configProperties&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">HashMap&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">configProperties&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">put&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;hive.metastore.uris&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>&lt;span class="s">&amp;#34;thrift://metastore-host:port&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HCatalogIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withConfigProperties&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">configProperties&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withDatabase&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;default&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="c1">//optional, assumes default if none specified
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withTable&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;employee&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPartition&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">partitionValues&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="c1">//optional, may be specified if the table is partitioned
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withBatchSize&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1024L&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="c1">//optional, assumes a default batch size of 1024 if none specified
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The Beam SDK for Python does not support HCatalogIO.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3 id="using-older-versions-of-hcatalog-1x">Using older versions of HCatalog (1.x)&lt;/h3>
&lt;p>&lt;code>HCatalogIO&lt;/code> is built for Apache HCatalog versions 2 and up and will not work out of the box for older versions of HCatalog.
The following illustrates a workaround to work with Hive 1.1.&lt;/p>
&lt;p>Include the following Hive 1.2 jars in the uber jar you build.
The 1.2 jars provide the necessary methods for Beam while remain compatible with Hive 1.1.&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.beam&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;beam-sdks-java-io-hcatalog&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;${beam.version}&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.hive.hcatalog&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;hive-hcatalog-core&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;1.2&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.hive&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;hive-metastore&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;1.2&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.hive&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;hive-exec&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;1.2&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.hive&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;hive-common&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;1.2&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code>&lt;/pre>&lt;p>Relocate &lt;em>only&lt;/em> the following hive packages:&lt;/p>
&lt;pre tabindex="0">&lt;code>&amp;lt;plugin&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.maven.plugins&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;maven-shade-plugin&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;${maven-shade-plugin.version}&amp;lt;/version&amp;gt;
&amp;lt;configuration&amp;gt;
&amp;lt;createDependencyReducedPom&amp;gt;false&amp;lt;/createDependencyReducedPom&amp;gt;
&amp;lt;filters&amp;gt;
&amp;lt;filter&amp;gt;
&amp;lt;artifact&amp;gt;*:*&amp;lt;/artifact&amp;gt;
&amp;lt;excludes&amp;gt;
&amp;lt;exclude&amp;gt;META-INF/*.SF&amp;lt;/exclude&amp;gt;
&amp;lt;exclude&amp;gt;META-INF/*.DSA&amp;lt;/exclude&amp;gt;
&amp;lt;exclude&amp;gt;META-INF/*.RSA&amp;lt;/exclude&amp;gt;
&amp;lt;/excludes&amp;gt;
&amp;lt;/filter&amp;gt;
&amp;lt;/filters&amp;gt;
&amp;lt;/configuration&amp;gt;
&amp;lt;executions&amp;gt;
&amp;lt;execution&amp;gt;
&amp;lt;phase&amp;gt;package&amp;lt;/phase&amp;gt;
&amp;lt;goals&amp;gt;
&amp;lt;goal&amp;gt;shade&amp;lt;/goal&amp;gt;
&amp;lt;/goals&amp;gt;
&amp;lt;configuration&amp;gt;
&amp;lt;shadedArtifactAttached&amp;gt;true&amp;lt;/shadedArtifactAttached&amp;gt;
&amp;lt;shadedClassifierName&amp;gt;shaded&amp;lt;/shadedClassifierName&amp;gt;
&amp;lt;transformers&amp;gt;
&amp;lt;transformer implementation=&amp;#34;org.apache.maven.plugins.shade.resource.ServicesResourceTransformer&amp;#34;/&amp;gt;
&amp;lt;/transformers&amp;gt;
&amp;lt;relocations&amp;gt;
&amp;lt;!-- Important: Do not relocate org.apache.hadoop.hive --&amp;gt;
&amp;lt;relocation&amp;gt;
&amp;lt;pattern&amp;gt;org.apache.hadoop.hive.conf&amp;lt;/pattern&amp;gt;
&amp;lt;shadedPattern&amp;gt;h12.org.apache.hadoop.hive.conf&amp;lt;/shadedPattern&amp;gt;
&amp;lt;/relocation&amp;gt;
&amp;lt;relocation&amp;gt;
&amp;lt;pattern&amp;gt;org.apache.hadoop.hive.ql&amp;lt;/pattern&amp;gt;
&amp;lt;shadedPattern&amp;gt;h12.org.apache.hadoop.hive.ql&amp;lt;/shadedPattern&amp;gt;
&amp;lt;/relocation&amp;gt;
&amp;lt;relocation&amp;gt;
&amp;lt;pattern&amp;gt;org.apache.hadoop.hive.metastore&amp;lt;/pattern&amp;gt;
&amp;lt;shadedPattern&amp;gt;h12.org.apache.hadoop.hive.metastore&amp;lt;/shadedPattern&amp;gt;
&amp;lt;/relocation&amp;gt;
&amp;lt;/relocations&amp;gt;
&amp;lt;/configuration&amp;gt;
&amp;lt;/execution&amp;gt;
&amp;lt;/executions&amp;gt;
&amp;lt;/plugin&amp;gt;
&lt;/code>&lt;/pre>&lt;p>This has been testing to read SequenceFile and ORCFile file backed tables running with
Beam 2.4.0 on Spark 2.3 / YARN in a Cloudera CDH 5.12.2 managed environment.&lt;/p></description></item><item><title>Documentation: Apache Parquet I/O connector</title><link>/documentation/io/built-in/parquet/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/parquet/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>&lt;a href="/documentation/io/built-in/">Built-in I/O Transforms&lt;/a>&lt;/p>
&lt;h1 id="apache-parquet-io-connector">Apache Parquet I/O connector&lt;/h1>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>The Beam SDKs include built-in transforms that can read data from and write data
to &lt;a href="https://parquet.apache.org">Apache Parquet&lt;/a> files.&lt;/p>
&lt;h2 id="before-you-start">Before you start&lt;/h2>
&lt;!-- Java specific -->
&lt;p class="language-java">To use ParquetIO, add the Maven artifact dependency to your &lt;code>pom.xml&lt;/code> file.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">dependency&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">beam&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">sdks&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">java&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">version&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">56&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">0&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">version&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">dependency&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Additional resources:&lt;/p>
&lt;span class="language-java">&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java">ParquetIO source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/org/apache/beam/sdk/io/parquet/ParquetIO.html">ParquetIO Javadoc&lt;/a>&lt;/li>
&lt;/ul>
&lt;/span>
&lt;!-- Python specific -->
&lt;p class="language-py">ParquetIO comes preinstalled with the Apache Beam python sdk..2.56.0&lt;/p>
&lt;p class="language-py">Additional resources:&lt;/p>
&lt;span class="language-py">&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio.py">ParquetIO source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.io.parquetio.html">ParquetIO Pydoc&lt;/a>&lt;/li>
&lt;/ul>
&lt;/span>
&lt;p class="language-java">&lt;h4 id="using-parquetio-with-spark-before-24">Using ParquetIO with Spark before 2.4&lt;/h4>
&lt;/p>
&lt;p class="language-java">&lt;code>ParquetIO&lt;/code> depends on an API introduced in Apache Parquet 1.10.0. &lt;strong>Spark 2.4.x is compatible and no additional steps are necessary&lt;/strong>. Older versions of Spark will not work out of the box since a pre-installed version of Parquet libraries will take precedence during execution. The following workaround should be applied.&lt;/p>
&lt;p class="language-java">&lt;blockquote>
&lt;p>&lt;strong>Note&lt;/strong>: The following technique allows you to execute your pipeline with &lt;code>ParquetIO&lt;/code> correctly.
The Parquet files that are consumed or generated by this Beam connector should remain interoperable with the other tools on your cluster.&lt;/p>
&lt;/blockquote>
&lt;/p>
&lt;p class="language-java">Include the Parquet artifact normally and ensure that it brings in the correct version of Parquet as a transitive dependency.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">dependency&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">beam&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">sdks&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">java&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">version&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">$&lt;/span>&lt;span class="o">{&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">version&lt;/span>&lt;span class="o">}&amp;lt;/&lt;/span>&lt;span class="n">version&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">dependency&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Relocate the following packages:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">plugin&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">maven&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">plugins&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">groupId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">maven&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">shade&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">plugin&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">artifactId&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">configuration&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">createDependencyReducedPom&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="kc">false&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">createDependencyReducedPom&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">filters&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">filter&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">artifact&lt;/span>&lt;span class="o">&amp;gt;*:*&amp;lt;/&lt;/span>&lt;span class="n">artifact&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">excludes&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">META&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">INF&lt;/span>&lt;span class="o">/*.&lt;/span>&lt;span class="na">SF&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">META&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">INF&lt;/span>&lt;span class="o">/*.&lt;/span>&lt;span class="na">DSA&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">META&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">INF&lt;/span>&lt;span class="o">/*.&lt;/span>&lt;span class="na">RSA&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">exclude&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">excludes&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">filter&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">filters&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">configuration&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">executions&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">execution&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">phase&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">package&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">phase&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">goals&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">goal&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">shade&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">goal&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">goals&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">configuration&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">shadedArtifactAttached&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">shadedArtifactAttached&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">shadedClassifierName&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">shaded&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">shadedClassifierName&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">relocations&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">shaded&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;!--&lt;/span> &lt;span class="n">Some&lt;/span> &lt;span class="n">packages&lt;/span> &lt;span class="n">are&lt;/span> &lt;span class="n">shaded&lt;/span> &lt;span class="n">already&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">and&lt;/span> &lt;span class="n">on&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">original&lt;/span> &lt;span class="n">spark&lt;/span> &lt;span class="n">classpath&lt;/span>&lt;span class="o">.&lt;/span> &lt;span class="n">Shade&lt;/span> &lt;span class="n">them&lt;/span> &lt;span class="n">more&lt;/span>&lt;span class="o">.&lt;/span> &lt;span class="o">--&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">shaded&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">reshaded&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parquet&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">avro&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">pattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">shaded&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">avro&lt;/span>&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">shadedPattern&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">relocation&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">relocations&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">transformers&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">transformer&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">implementation&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s">&amp;#34;org.apache.maven.plugins.shade.resource.ServicesResourceTransformer&amp;#34;&lt;/span>&lt;span class="o">/&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">transformers&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">configuration&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">execution&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">executions&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">&amp;lt;/&lt;/span>&lt;span class="n">plugin&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">This technique has been tested to work on Spark 2.2.3, Spark 2.3.3 and Spark 2.4.3 (although it is optional for Spark 2.4+).&lt;/p></description></item><item><title>Documentation: Apache SingleStore I/O connector</title><link>/documentation/io/built-in/singlestore/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/singlestore/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>&lt;a href="/documentation/io/built-in/">Built-in I/O Transforms&lt;/a>&lt;/p>
&lt;h1 id="singlestoredb-io">SingleStoreDB I/O&lt;/h1>
&lt;p>Pipeline options and general information about using and running SingleStoreDB I/O.&lt;/p>
&lt;h2 id="before-you-start">Before you start&lt;/h2>
&lt;p>To use SingleStoreDB I/O, add the Maven artifact dependency to your &lt;code>pom.xml&lt;/code> file.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.beam&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;beam-sdks-java-io-singlestore&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;2.56.0&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Additional resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/tree/master/sdks/java/io/singlestore/src/main/java/org/apache/beam/sdk/io/singlestore">SingleStoreIO source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/org/apache/beam/sdk/io/singlestore/SingleStoreIO.html">SingleStoreIO Javadoc&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.singlestore.com/">SingleStore documentation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="authentication">Authentication&lt;/h2>
&lt;p>DataSource configuration is required for configuring SingleStoreIO connection properties.&lt;/p>
&lt;p>Create the DataSource configuration:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>SingleStoreIO.DataSourceConfiguration
.create(&amp;#34;myHost:3306&amp;#34;)
.withDatabase(&amp;#34;db&amp;#34;)
.withConnectionProperties(&amp;#34;connectTimeout=30000;useServerPrepStmts=FALSE&amp;#34;)
.withPassword(&amp;#34;password&amp;#34;)
.withUsername(&amp;#34;admin&amp;#34;);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Where parameters can be:&lt;/p>
&lt;ul>
&lt;li>&lt;code>.create(endpoint)&lt;/code>
&lt;ul>
&lt;li>Hostname or IP address of the SingleStoreDB in the form host:[port] (port is optional).&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;li>Example: &lt;code>.create(&amp;quot;myHost:3306&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withUsername(username)&lt;/code>
&lt;ul>
&lt;li>SingleStoreDB username.&lt;/li>
&lt;li>Default - &lt;code>root&lt;/code>.&lt;/li>
&lt;li>Example: &lt;code>.withUsername(&amp;quot;USERNAME&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withPassword(password)&lt;/code>
&lt;ul>
&lt;li>Password of the SingleStoreDB user.&lt;/li>
&lt;li>Default - empty String.&lt;/li>
&lt;li>Example: &lt;code>.withPassword(&amp;quot;PASSWORD&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withDatabase(database)&lt;/code>
&lt;ul>
&lt;li>Name of the SingleStoreDB database to use.&lt;/li>
&lt;li>Example: &lt;code>.withDatabase(&amp;quot;MY_DATABASE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withConnectionProperties(connectionProperties)&lt;/code>
&lt;ul>
&lt;li>List of properties that are used by JDBC Driver.&lt;/li>
&lt;li>The format is “key1=value1;key2=value2;&amp;hellip;”.&lt;/li>
&lt;li>A full list of supported properties can be found &lt;a href="https://docs.singlestore.com/managed-service/en/developer-resources/connect-with-application-development-tools/connect-with-java-jdbc/the-singlestore-jdbc-driver.html#connection-string-parameters">here&lt;/a>.&lt;/li>
&lt;li>Example: &lt;code>.withConnectionProperties(&amp;quot;connectTimeout=30000;useServerPrepStmts=FALSE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong> - &lt;code>.withDatabase(...)&lt;/code> &lt;strong>is required for &lt;code>.readWithPartitions()&lt;/code>&lt;/strong>.&lt;/p>
&lt;h2 id="reading-from-singlestoredb">Reading from SingleStoreDB&lt;/h2>
&lt;p>One of the functions of SingleStoreIO is reading from SingleStoreDB tables.
SingleStoreIO supports two types of reading:&lt;/p>
&lt;ul>
&lt;li>Sequential data reading (&lt;code>.read()&lt;/code>)&lt;/li>
&lt;li>Parallel data reading (&lt;code>.readWithPartitions()&lt;/code>)&lt;/li>
&lt;/ul>
&lt;p>In many cases, parallel data reading is preferred over sequential data reading because of performance reasons.&lt;/p>
&lt;h3 id="sequential-data-reading">Sequential data reading&lt;/h3>
&lt;p>The basic &lt;code>.read()&lt;/code> operation usage is as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>PCollection&amp;lt;USER_DATA_TYPE&amp;gt; items = pipeline.apply(
SingleStoreIO.&amp;lt;USER_DATA_TYPE&amp;gt;read()
.withDataSourceConfiguration(dc)
.withTable(&amp;#34;MY_TABLE&amp;#34;) // or .withQuery(&amp;#34;QUERY&amp;#34;)
.withStatementPreparator(statementPreparator)
.withOutputParallelization(true)
.withRowMapper(mapper)
);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Where parameters can be:&lt;/p>
&lt;ul>
&lt;li>&lt;code>.withDataSourceConfiguration(dataSourceConfiguration)&lt;/code>
&lt;ul>
&lt;li>&lt;code>DataSourceConfiguration&lt;/code> object with all information needed to establish a connection to the database. See &lt;a href="#authentication">authentication&lt;/a> for more information.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withTable(table)&lt;/code>
&lt;ul>
&lt;li>Table to read data from.&lt;/li>
&lt;li>Example: &lt;code>.withTable(&amp;quot;MY_TABLE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withQuery(query)&lt;/code>
&lt;ul>
&lt;li>SQL query to execute.&lt;/li>
&lt;li>Example: &lt;code>.withTable(&amp;quot;SELECT * FROM MY_TABLE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withStatementPreparator(statementPreparator)&lt;/code>
&lt;ul>
&lt;li>&lt;a href="#statementpreparator">StatementPreparator&lt;/a> object.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withRowMapper(rowMapper)&lt;/code>
&lt;ul>
&lt;li>&lt;a href="#rowmapper">RowMapper&lt;/a> object.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withOutputParallelization(outputParallelization)&lt;/code>
&lt;ul>
&lt;li>Boolean value that indicates whether to reshuffle the result.&lt;/li>
&lt;li>Default - &lt;code>true&lt;/code>.&lt;/li>
&lt;li>Example: &lt;code>.withOutputParallelization(true)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong> - either &lt;code>.withTable(...)&lt;/code> or &lt;code>.withQuery(...)&lt;/code> &lt;strong>is required&lt;/strong>.&lt;/p>
&lt;h3 id="parallel-data-reading">Parallel data reading&lt;/h3>
&lt;p>The basic &lt;code>.readWithPartitions()&lt;/code> operation usage is as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>PCollection&amp;lt;USER_DATA_TYPE&amp;gt; items = pipeline.apply(
SingleStoreIO.&amp;lt;USER_DATA_TYPE&amp;gt;readWithPartitions()
.withDataSourceConfiguration(dc)
.withTable(&amp;#34;MY_TABLE&amp;#34;) // or .withQuery(&amp;#34;QUERY&amp;#34;)
.withRowMapper(mapper)
);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Where parameters can be:&lt;/p>
&lt;ul>
&lt;li>&lt;code>.withDataSourceConfiguration(dataSourceConfiguration)&lt;/code>
&lt;ul>
&lt;li>&lt;code>DataSourceConfiguration&lt;/code> object with all information needed to establish a connection to the database. See &lt;a href="#authentication">DataSource Configuration&lt;/a> for more information.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withTable(table)&lt;/code>
&lt;ul>
&lt;li>Table to read data from.&lt;/li>
&lt;li>Example: &lt;code>.withTable(&amp;quot;MY_TABLE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withQuery(query)&lt;/code>
&lt;ul>
&lt;li>SQL query to execute.&lt;/li>
&lt;li>Example: &lt;code>.withTable(&amp;quot;SELECT * FROM MY_TABLE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withRowMapper(rowMapper)&lt;/code>
&lt;ul>
&lt;li>&lt;a href="#rowmapper">RowMapper&lt;/a> object.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong> - either &lt;code>.withTable(...)&lt;/code> or &lt;code>.withQuery(...)&lt;/code> &lt;strong>is required&lt;/strong>.&lt;/p>
&lt;h3 id="statementpreparator">StatementPreparator&lt;/h3>
&lt;p>The &lt;code>StatementPreparator&lt;/code> is used by &lt;code>read()&lt;/code> to set the parameters of the &lt;code>PreparedStatement&lt;/code>.
For example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>public static class MyStatmentPreparator implements SingleStoreIO.StatementPreparator {
@Override
public void setParameters(PreparedStatement preparedStatement) throws Exception {
preparedStatement.setInt(1, 10);
}
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3 id="rowmapper">RowMapper&lt;/h3>
&lt;p>The &lt;code>RowMapper&lt;/code> is used by &lt;code>read()&lt;/code> and &lt;code>readWithPartitions()&lt;/code> for converting each row of the &lt;code>ResultSet&lt;/code>
into an element of the resulting &lt;code>PCollection&lt;/code>.
For example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>public static class MyRowMapper implements SingleStoreIO.RowMapper&amp;lt;MyRow&amp;gt; {
@Override
public MyRow mapRow(ResultSet resultSet) throws Exception {
return MyRow.create(resultSet.getInt(1), resultSet.getString(2));
}
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="writing-to-singlestoredb-tables">Writing to SingleStoreDB tables&lt;/h2>
&lt;p>One of the functions of SingleStoreIO is writing to SingleStoreDB tables.
This transformation enables you to send the user&amp;rsquo;s &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollection.html">PCollection&lt;/a> to your SingleStoreDB database.
It returns number of rows written by each batch of elements.&lt;/p>
&lt;p>The basic &lt;code>.write()&lt;/code> operation usage is as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>data.apply(
SingleStoreIO.&amp;lt;USER_DATA_TYPE&amp;gt;write()
.withDataSourceConfiguration(dc)
.withTable(&amp;#34;MY_TABLE&amp;#34;)
.withUserDataMapper(mapper)
.withBatchSize(100000)
);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Where parameters can be:&lt;/p>
&lt;ul>
&lt;li>&lt;code>.withDataSourceConfiguration(dataSourceConfiguration)&lt;/code>
&lt;ul>
&lt;li>&lt;code>DataSourceConfiguration&lt;/code> object with all information needed to establish a connection to the database. See &lt;a href="#authentication">DataSource Configuration&lt;/a> for more information.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withTable(table)&lt;/code>
&lt;ul>
&lt;li>Table in which data should be saved.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;li>Example: &lt;code>.withTable(&amp;quot;MY_TABLE&amp;quot;)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withBatchSize(batchSize)&lt;/code>
&lt;ul>
&lt;li>Number of rows loaded by one &lt;code>LOAD DATA&lt;/code> query.&lt;/li>
&lt;li>Default - 100000.&lt;/li>
&lt;li>Example: &lt;code>.withBatchSize(100000)&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withUserDataMapper(userDataMapper)&lt;/code>
&lt;ul>
&lt;li>&lt;a href="#userdatamapper">UserDataMapper&lt;/a> object.&lt;/li>
&lt;li>Required parameter.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="userdatamapper">UserDataMapper&lt;/h3>
&lt;p>The &lt;code>UserDataMapper&lt;/code> is required to map data from a &lt;code>PCollection&lt;/code> to an array of &lt;code>String&lt;/code> values before the &lt;code>write()&lt;/code> operation saves the data.
For example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>public static class MyRowDataMapper implements SingleStoreIO.UserDataMapper&amp;lt;MyRow&amp;gt; {
@Override
public List&amp;lt;String&amp;gt; mapRow(MyRow element) {
List&amp;lt;String&amp;gt; res = new ArrayList&amp;lt;&amp;gt;();
res.add(element.id().toString());
res.add(element.name());
return res;
}
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p></description></item><item><title>Documentation: Apache Snowflake I/O connector</title><link>/documentation/io/built-in/snowflake/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/snowflake/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;p>&lt;a href="/documentation/io/built-in/">Built-in I/O Transforms&lt;/a>&lt;/p>
&lt;h1 id="snowflake-io">Snowflake I/O&lt;/h1>
&lt;p>Pipeline options and general information about using and running Snowflake IO.&lt;/p>
&lt;h2 id="before-you-start">Before you start&lt;/h2>
&lt;p>To use SnowflakeIO, add the Maven artifact dependency to your &lt;code>pom.xml&lt;/code> file.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>&amp;lt;dependency&amp;gt;
&amp;lt;groupId&amp;gt;org.apache.beam&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;beam-sdks-java-io-snowflake&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;2.56.0&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Additional resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/tree/master/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake">SnowflakeIO source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/org/apache/beam/sdk/io/snowflake/SnowflakeIO.html">SnowflakeIO Javadoc&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.snowflake.com/en/">Snowflake documentation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="authentication">Authentication&lt;/h2>
&lt;p>Reading and batch writing supports the following authentication methods:&lt;/p>
&lt;ul>
&lt;li>Username and password&lt;/li>
&lt;li>Key pair&lt;/li>
&lt;li>OAuth token&lt;/li>
&lt;/ul>
&lt;p>Streaming writing supports only key pair authentication. For details, see: &lt;a href="https://issues.apache.org/jira/browse/BEAM-3304">BEAM-3304&lt;/a>.&lt;/p>
&lt;p>Passing credentials is done via Pipeline options used to instantiate &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> class. Each authentication method has different ways to configure this class.&lt;/p>
&lt;h3 id="username-and-password">Username and password&lt;/h3>
&lt;p>To use username/password authentication in SnowflakeIO, invoke your pipeline with the following Pipeline options:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>--username=&amp;lt;USERNAME&amp;gt; --password=&amp;lt;PASSWORD&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Passing credentials is done via Pipeline options used to instantiate &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> class.
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>SnowflakeIO.DataSourceConfiguration datasource = SnowflakeIO.DataSourceConfiguration.create()
.withUsernamePasswordAuth(
options.getUsername(),
options.getPassword())
.withServerName(options.getServerName())
.withDatabase(options.getDatabase())
.withRole(options.getRole())
.withWarehouse(options.getWarehouse())
.withSchema(options.getSchema());&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3 id="key-pair">Key pair&lt;/h3>
&lt;p>To use this authentication method, you must first generate a key pair and associate the public key with the Snowflake user that will connect using the IO transform. For instructions, see the &lt;a href="https://docs.snowflake.com/en/user-guide/key-pair-auth.html">Key Pair Authentication &amp;amp; Key Pair Rotation&lt;/a> in Snowflake documentation.&lt;/p>
&lt;p>To use key pair authentication with SnowflakeIO, invoke your pipeline with one of the following set of Pipeline options:&lt;/p>
&lt;ul>
&lt;li>with passing the key as a path:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> --username=&amp;lt;USERNAME&amp;gt; --privateKeyPath=&amp;lt;PATH_TO_P8_FILE&amp;gt; --privateKeyPassphrase=&amp;lt;PASSWORD_FOR_KEY&amp;gt;
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
The initialization of an &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> class may be as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> SnowflakeIO.DataSourceConfiguration datasource = SnowflakeIO.DataSourceConfiguration.create()
.withKeyPairPathAuth(
options.getUsername(),
options.getPrivateKeyPath(),
options.getPrivateKeyPassphrase())
.withServerName(options.getServerName())
.withDatabase(options.getDatabase())
.withRole(options.getRole())
.withWarehouse(options.getWarehouse())
.withSchema(options.getSchema());
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;li>with passing the key as a value:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> --username=&amp;lt;USERNAME&amp;gt; --rawPrivateKey=&amp;lt;PRIVATE_KEY&amp;gt; --privateKeyPassphrase=&amp;lt;PASSWORD_FOR_KEY&amp;gt;
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
The initialization of an &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> class may be as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> SnowflakeIO.DataSourceConfiguration datasource = SnowflakeIO.DataSourceConfiguration.create()
.withKeyPairRawAuth(
options.getUsername(),
options.getRawPrivateKey(),
options.getPrivateKeyPassphrase())
.withServerName(options.getServerName())
.withDatabase(options.getDatabase())
.withRole(options.getRole())
.withWarehouse(options.getWarehouse())
.withSchema(options.getSchema());
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;/ul>
&lt;h3 id="oauth-token">OAuth token&lt;/h3>
&lt;p>SnowflakeIO also supports OAuth token.&lt;/p>
&lt;p>&lt;strong>IMPORTANT&lt;/strong>: SnowflakeIO requires a valid OAuth access token. It will neither be able to refresh the token nor obtain it using a web-based flow. For information on configuring an OAuth integration and obtaining the token, see the &lt;a href="https://docs.snowflake.com/en/user-guide/oauth-intro.html">Snowflake documentation&lt;/a>.&lt;/p>
&lt;p>Once you have the token, invoke your pipeline with following Pipeline Options:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>--oauthToken=&amp;lt;TOKEN&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
The initialization of an &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> class may be as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> SnowflakeIO.DataSourceConfiguration datasource = SnowflakeIO.DataSourceConfiguration
.create()
.withUrl(options.getUrl())
.withServerName(options.getServerName())
.withDatabase(options.getDatabase())
.withWarehouse(options.getWarehouse())
.withSchema(options.getSchema());&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="datasource-configuration">DataSource Configuration&lt;/h2>
&lt;p>DataSource configuration is required in both read and write object for configuring Snowflake connection properties for IO purposes.&lt;/p>
&lt;h3 id="general-usage">General usage&lt;/h3>
&lt;p>Create the DataSource configuration:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code> SnowflakeIO.DataSourceConfiguration
.create()
.withUrl(options.getUrl())
.withServerName(options.getServerName())
.withDatabase(options.getDatabase())
.withWarehouse(options.getWarehouse())
.withSchema(options.getSchema());&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Where parameters can be:&lt;/p>
&lt;ul>
&lt;li>&lt;code> .withUrl(...)&lt;/code>
&lt;ul>
&lt;li>JDBC-like URL for your Snowflake account, including account name and region, without any parameters.&lt;/li>
&lt;li>Example: &lt;code>.withUrl(&amp;quot;jdbc:snowflake://account.snowflakecomputing.com&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withServerName(...)&lt;/code>
&lt;ul>
&lt;li>Server Name - full server name with account, zone and domain.&lt;/li>
&lt;li>Example: &lt;code>.withServerName(&amp;quot;account.snowflakecomputing.com&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withDatabase(...)&lt;/code>
&lt;ul>
&lt;li>Name of the Snowflake database to use.&lt;/li>
&lt;li>Example: &lt;code>.withDatabase(&amp;quot;MY_DATABASE&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withWarehouse(...)&lt;/code>
&lt;ul>
&lt;li>Name of the Snowflake warehouse to use. This parameter is optional. If no warehouse name is specified, the default warehouse for the user is used.&lt;/li>
&lt;li>Example: &lt;code>.withWarehouse(&amp;quot;MY_WAREHOUSE&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withSchema(...)&lt;/code>
&lt;ul>
&lt;li>Name of the schema in the database to use. This parameter is optional.&lt;/li>
&lt;li>Example: &lt;code>.withSchema(&amp;quot;PUBLIC&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withUsernamePasswordAuth(username, password)&lt;/code>
&lt;ul>
&lt;li>Sets username/password authentication.&lt;/li>
&lt;li>Example: &lt;code>.withUsernamePasswordAuth(&amp;quot;USERNAME&amp;quot;, &amp;quot;PASSWORD&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withOAuth(token)&lt;/code>
&lt;ul>
&lt;li>Sets OAuth authentication.&lt;/li>
&lt;li>Example: &lt;code>.withOAuth(&amp;quot;TOKEN&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withKeyPairAuth(username, privateKey)&lt;/code>
&lt;ul>
&lt;li>Sets key pair authentication using username and &lt;a href="https://docs.oracle.com/javase/8/docs/api/java/security/PrivateKey.html">PrivateKey&lt;/a>&lt;/li>
&lt;li>Example: &lt;code>.withKeyPairAuth(&amp;quot;USERNAME&amp;quot;,&lt;/code> &lt;a href="https://docs.oracle.com/javase/8/docs/api/java/security/PrivateKey.html">PrivateKey&lt;/a>&lt;code>)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withKeyPairPathAuth(username, privateKeyPath, privateKeyPassphrase)&lt;/code>
&lt;ul>
&lt;li>Sets key pair authentication using username, path to private key file and passphrase.&lt;/li>
&lt;li>Example: &lt;code>.withKeyPairPathAuth(&amp;quot;USERNAME&amp;quot;, &amp;quot;PATH/TO/KEY.P8&amp;quot;, &amp;quot;PASSPHRASE&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>.withKeyPairRawAuth(username, rawPrivateKey, privateKeyPassphrase)&lt;/code>
&lt;ul>
&lt;li>Sets key pair authentication using username, private key and passphrase.&lt;/li>
&lt;li>Example: &lt;code>.withKeyPairRawAuth(&amp;quot;USERNAME&amp;quot;, &amp;quot;PRIVATE_KEY&amp;quot;, &amp;quot;PASSPHRASE&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong> - either &lt;code>.withUrl(...)&lt;/code> or &lt;code>.withServerName(...)&lt;/code> &lt;strong>is required&lt;/strong>.&lt;/p>
&lt;h2 id="pipeline-options">Pipeline options&lt;/h2>
&lt;p>Use Beam’s &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/options/PipelineOptions.html">Pipeline options&lt;/a> to set options via the command line.&lt;/p>
&lt;h3 id="snowflake-pipeline-options">Snowflake Pipeline options&lt;/h3>
&lt;p>Snowflake IO library supports following options that can be passed via the &lt;a href="/documentation/io/built-in/snowflake/#running-main-command-with-pipeline-options">command line&lt;/a> by default when a Pipeline uses them:&lt;/p>
&lt;p>&lt;code>--url&lt;/code> Snowflake&amp;rsquo;s JDBC-like url including account name and region without any parameters.&lt;/p>
&lt;p>&lt;code>--serverName&lt;/code> Full server name with account, zone and domain.&lt;/p>
&lt;p>&lt;code>--username&lt;/code> Required for username/password and Private Key authentication.&lt;/p>
&lt;p>&lt;code>--oauthToken&lt;/code> Required for OAuth authentication only.&lt;/p>
&lt;p>&lt;code>--password&lt;/code> Required for username/password authentication only.&lt;/p>
&lt;p>&lt;code>--privateKeyPath&lt;/code> Path to Private Key file. Required for Private Key authentication only.&lt;/p>
&lt;p>&lt;code>--rawPrivateKey&lt;/code> Private Key. Required for Private Key authentication only.&lt;/p>
&lt;p>&lt;code>--privateKeyPassphrase&lt;/code> Private Key&amp;rsquo;s passphrase. Required for Private Key authentication only.&lt;/p>
&lt;p>&lt;code>--stagingBucketName&lt;/code> External bucket path ending with &lt;code>/&lt;/code>. I.e. &lt;code>{gs,s3}://bucket/&lt;/code>. Sub-directories are allowed.&lt;/p>
&lt;p>&lt;code>--storageIntegrationName&lt;/code> Storage integration name&lt;/p>
&lt;p>&lt;code>--warehouse&lt;/code> Warehouse to use. Optional.&lt;/p>
&lt;p>&lt;code>--database&lt;/code> Database name to connect to. Optional.&lt;/p>
&lt;p>&lt;code>--schema&lt;/code> Schema to use. Optional.&lt;/p>
&lt;p>&lt;code>--table&lt;/code> Table to use. Optional.&lt;/p>
&lt;p>&lt;code>--query&lt;/code> Query to use. Optional.&lt;/p>
&lt;p>&lt;code>--role&lt;/code> Role to use. Optional.&lt;/p>
&lt;p>&lt;code>--authenticator&lt;/code> Authenticator to use. Optional.&lt;/p>
&lt;p>&lt;code>--portNumber&lt;/code> Port number. Optional.&lt;/p>
&lt;p>&lt;code>--loginTimeout&lt;/code> Login timeout. Optional.&lt;/p>
&lt;p>&lt;code>--snowPipe&lt;/code> SnowPipe name. Optional.&lt;/p>
&lt;h3 id="running-main-command-with-pipeline-options">Running main command with Pipeline options&lt;/h3>
&lt;p>To pass Pipeline options via the command line, use &lt;code>--args&lt;/code> in a gradle command as follows:&lt;/p>
&lt;p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>./gradle run
--args=&amp;#34;
--serverName=&amp;lt;SNOWFLAKE SERVER NAME&amp;gt;
Example: --serverName=account.region.gcp.snowflakecomputing.com
--username=&amp;lt;SNOWFLAKE USERNAME&amp;gt;
Example: --username=testuser
--password=&amp;lt;SNOWFLAKE PASSWORD&amp;gt;
Example: --password=mypassword
--database=&amp;lt;SNOWFLAKE DATABASE&amp;gt;
Example: --database=TEST_DATABASE
--schema=&amp;lt;SNOWFLAKE SCHEMA&amp;gt;
Example: --schema=public
--table=&amp;lt;SNOWFLAKE TABLE IN DATABASE&amp;gt;
Example: --table=TEST_TABLE
--query=&amp;lt;IF NOT TABLE THEN QUERY&amp;gt;
Example: --query=‘SELECT column FROM TABLE’
--storageIntegrationName=&amp;lt;SNOWFLAKE STORAGE INTEGRATION NAME&amp;gt;
Example: --storageIntegrationName=my_integration
--stagingBucketName=&amp;lt;GCS OR S3 BUCKET&amp;gt;
Example: --stagingBucketName={gs,s3}://bucket/
--runner=&amp;lt;DirectRunner/DataflowRunner&amp;gt;
Example: --runner=DataflowRunner
--project=&amp;lt;FOR DATAFLOW RUNNER: GCP PROJECT NAME&amp;gt;
Example: --project=my_project
--tempLocation=&amp;lt;FOR DATAFLOW RUNNER: GCS TEMP LOCATION STARTING
WITH gs://…&amp;gt;
Example: --tempLocation=gs://bucket/temp/
--region=&amp;lt;FOR DATAFLOW RUNNER: GCP REGION&amp;gt;
Example: --region=us-east-1
--appName=&amp;lt;OPTIONAL: DATAFLOW JOB NAME PREFIX&amp;gt;
Example: --appName=my_job&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Then in the code it is possible to access the parameters with arguments using the &lt;code>options.getStagingBucketName()&lt;/code> command.&lt;/p>
&lt;h3 id="running-test-command-with-pipeline-options">Running test command with Pipeline options&lt;/h3>
&lt;p>To pass Pipeline options via the command line, use &lt;code>-DintegrationTestPipelineOptions&lt;/code> in a gradle command as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>./gradlew test --tests nameOfTest
-DintegrationTestPipelineOptions=&amp;#39;[
&amp;#34;--serverName=&amp;lt;SNOWFLAKE SERVER NAME&amp;gt;&amp;#34;,
Example: --serverName=account.region.gcp.snowflakecomputing.com
&amp;#34;--username=&amp;lt;SNOWFLAKE USERNAME&amp;gt;&amp;#34;,
Example: --username=testuser
&amp;#34;--password=&amp;lt;SNOWFLAKE PASSWORD&amp;gt;&amp;#34;,
Example: --password=mypassword
&amp;#34;--schema=&amp;lt;SNOWFLAKE SCHEMA&amp;gt;&amp;#34;,
Example: --schema=PUBLIC
&amp;#34;--table=&amp;lt;SNOWFLAKE TABLE IN DATABASE&amp;gt;&amp;#34;,
Example: --table=TEST_TABLE
&amp;#34;--database=&amp;lt;SNOWFLAKE DATABASE&amp;gt;&amp;#34;,
Example: --database=TEST_DATABASE
&amp;#34;--storageIntegrationName=&amp;lt;SNOWFLAKE STORAGE INTEGRATION NAME&amp;gt;&amp;#34;,
Example: --storageIntegrationName=my_integration
&amp;#34;--stagingBucketName=&amp;lt;GCS OR S3 BUCKET&amp;gt;&amp;#34;,
Example: --stagingBucketName={gs,s3}://bucket
&amp;#34;--externalLocation=&amp;lt;GCS BUCKET URL STARTING WITH GS://&amp;gt;&amp;#34;,
Example: --tempLocation=gs://bucket/temp/
]&amp;#39; --no-build-cache&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p>Where all parameters are starting with “&amp;ndash;”, they are surrounded with double quotation and separated with comma:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>--serverName=&amp;lt;SNOWFLAKE SERVER NAME&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies the full name of your account (provided by Snowflake). Note that your full account name might include additional segments that identify the region and cloud platform where your account is hosted.&lt;/li>
&lt;li>Example: &lt;code>--serverName=xy12345.eu-west-1.gcp..snowflakecomputing.com&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--username=&amp;lt;SNOWFLAKE USERNAME&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies the login name of the user.&lt;/li>
&lt;li>Example: &lt;code>--username=my_username&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--password=&amp;lt;SNOWFLAKE PASSWORD&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies the password for the specified user.&lt;/li>
&lt;li>Example: &lt;code>--password=my_secret&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--schema=&amp;lt;SNOWFLAKE SCHEMA&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies the schema to use for the specified database once connected. The specified schema should be an existing schema for which the specified user’s role has privileges.&lt;/li>
&lt;li>Example: &lt;code>--schema=PUBLIC&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--table=&amp;lt;SNOWFLAKE TABLE IN DATABASE&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Example: &lt;code>--table=MY_TABLE&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--database=&amp;lt;SNOWFLAKE DATABASE&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies the database to use once connected. The specified database should be an existing database for which the specified user’s role has privileges.&lt;/li>
&lt;li>Example: &lt;code>--database=MY_DATABASE&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--storageIntegrationName=&amp;lt;SNOWFLAKE STORAGE INTEGRATION NAME&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Name of storage integration created in &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration.html">Snowflake&lt;/a> for a cloud storage of choice.&lt;/li>
&lt;li>Example: &lt;code>--storageIntegrationName=my_google_integration&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="running-pipelines-on-dataflow">Running pipelines on Dataflow&lt;/h2>
&lt;p>By default, pipelines are run on &lt;a href="/documentation/runners/direct/">Direct Runner&lt;/a> on your local machine. To run a pipeline on &lt;a href="https://cloud.google.com/dataflow/">Google Dataflow&lt;/a>, you must provide the following Pipeline options:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>--runner=DataflowRunner&lt;/code>&lt;/p>
&lt;ul>
&lt;li>The Dataflow’s specific runner.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--project=&amp;lt;GCS PROJECT&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Name of the Google Cloud Platform project.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--stagingBucketName=&amp;lt;GCS OR S3 BUCKET&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Google Cloud Services bucket or AWS S3 bucket where the Beam files will be staged.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--maxNumWorkers=5&lt;/code>&lt;/p>
&lt;ul>
&lt;li>(optional) Maximum number of workers.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--appName=&amp;lt;JOB NAME&amp;gt;&lt;/code>&lt;/p>
&lt;ul>
&lt;li>(optional) Prefix for the job name in the Dataflow Dashboard.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>More pipeline options for Dataflow can be found &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html">here&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Note&lt;/strong>: To properly authenticate with Google Cloud, please use &lt;a href="https://cloud.google.com/sdk/gcloud/">gcloud&lt;/a> or follow the &lt;a href="https://cloud.google.com/docs/authentication/">Google Cloud documentation&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Important&lt;/strong>: Please acknowledge &lt;a href="https://cloud.google.com/dataflow/pricing">Google Dataflow pricing&lt;/a>&lt;/p>
&lt;h3 id="running-pipeline-templates-on-dataflow">Running pipeline templates on Dataflow&lt;/h3>
&lt;p>Google Dataflow is supporting &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/overview">template&lt;/a> creation which means staging pipelines on Cloud Storage and running them with ability to pass runtime parameters that are only available during pipeline execution.&lt;/p>
&lt;p>The process of creating own Dataflow template is following&lt;/p>
&lt;ol>
&lt;li>Create your own pipeline.&lt;/li>
&lt;li>Create &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/creating-templates#creating-and-staging-templates">Dataflow template&lt;/a> with checking which options SnowflakeIO is supporting at runtime.&lt;/li>
&lt;li>Run a Dataflow template using &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/running-templates#using-the-cloud-console">Cloud Console&lt;/a>, &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/running-templates#using-the-rest-api">REST API&lt;/a> or &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/running-templates#using-gcloud">gcloud&lt;/a>.&lt;/li>
&lt;/ol>
&lt;p>Currently, SnowflakeIO supports following options at runtime:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>--serverName&lt;/code> Full server name with account, zone and domain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--username&lt;/code> Required for username/password and Private Key authentication.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--password&lt;/code> Required for username/password authentication only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--rawPrivateKey&lt;/code> Private Key file. Required for Private Key authentication only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--privateKeyPassphrase&lt;/code> Private Key&amp;rsquo;s passphrase. Required for Private Key authentication only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--stagingBucketName&lt;/code> external bucket path ending with &lt;code>/&lt;/code>. I.e. &lt;code>{gs,s3}://bucket/&lt;/code>. Sub-directories are allowed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--storageIntegrationName&lt;/code> Storage integration name.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--warehouse&lt;/code> Warehouse to use. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--database&lt;/code> Database name to connect to. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--schema&lt;/code> Schema to use. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--table&lt;/code> Table to use. Optional. Note: table is not in default pipeline options.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--query&lt;/code> Query to use. Optional. Note: query is not in default pipeline options.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--role&lt;/code> Role to use. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--snowPipe&lt;/code> SnowPipe name. Optional.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Currently, SnowflakeIO &lt;strong>doesn&amp;rsquo;t support&lt;/strong> following options at runtime:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>--url&lt;/code> Snowflake&amp;rsquo;s JDBC-like url including account name and region without any parameters.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--oauthToken&lt;/code> Required for OAuth authentication only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--privateKeyPath&lt;/code> Path to Private Key file. Required for Private Key authentication only.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--authenticator&lt;/code> Authenticator to use. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--portNumber&lt;/code> Port number. Optional.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>--loginTimeout&lt;/code> Login timeout. Optional.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="writing-to-snowflake-tables">Writing to Snowflake tables&lt;/h2>
&lt;p>One of the functions of SnowflakeIO is writing to Snowflake tables. This transformation enables you to finish the Beam pipeline with an output operation that sends the user&amp;rsquo;s &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollection.html">PCollection&lt;/a> to your Snowflake database.&lt;/p>
&lt;h3 id="batch-write-from-a-bounded-source">Batch write (from a bounded source)&lt;/h3>
&lt;p>The basic .&lt;code>write()&lt;/code> operation usage is as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>data.apply(
SnowflakeIO.&amp;lt;type&amp;gt;write()
.withDataSourceConfiguration(dc)
.to(&amp;#34;MY_TABLE&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withUserDataMapper(mapper)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Replace type with the data type of the &lt;code>PCollection&lt;/code> object to write; for example, &lt;code>SnowflakeIO.&amp;lt;String&amp;gt;&lt;/code> for an input &lt;code>PCollection&lt;/code> of Strings.&lt;/p>
&lt;p>All the below parameters are required:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>.withDataSourceConfiguration()&lt;/code> Accepts a DatasourceConfiguration object.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.to()&lt;/code> Accepts the target Snowflake table name.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStagingBucketName()&lt;/code> Accepts a cloud bucket path ended with slash.
-Example: &lt;code>.withStagingBucketName(&amp;quot;{gs,s3}://bucket/my/dir/&amp;quot;)&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStorageIntegrationName()&lt;/code> Accepts a name of a Snowflake storage integration object created according to Snowflake documentation. Examples:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE OR REPLACE STORAGE INTEGRATION &amp;#34;test_integration&amp;#34;
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;gcs://bucket/&amp;#39;);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE STORAGE INTEGRATION &amp;#34;test_integration&amp;#34;
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = &amp;#39;&amp;lt;ARN ROLE NAME&amp;gt;&amp;#39;
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;s3://bucket/&amp;#39;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Then:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>.withStorageIntegrationName(&amp;#34;test_integration&amp;#34;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withUserDataMapper()&lt;/code> Accepts the UserDataMapper function that will map a user&amp;rsquo;s PCollection to an array of String values &lt;code>(String[])&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong>:
SnowflakeIO uses &lt;code>COPY&lt;/code> statements behind the scenes to write (using &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html">COPY to table&lt;/a>). StagingBucketName will be used to save CSV files which will end up in Snowflake. Those CSV files will be saved under the “stagingBucketName” path.&lt;/p>
&lt;p>&lt;strong>Optional&lt;/strong> for batching:&lt;/p>
&lt;ul>
&lt;li>&lt;code>.withQuotationMark()&lt;/code>
&lt;ul>
&lt;li>Default value: &lt;code>‘&lt;/code> (single quotation mark).&lt;/li>
&lt;li>Accepts String with one character. It will surround all text (String) fields saved to CSV. It should be one of the accepted characters by &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html">Snowflake’s&lt;/a> &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html">FIELD_OPTIONALLY_ENCLOSED_BY&lt;/a> parameter (double quotation mark, single quotation mark or none).&lt;/li>
&lt;li>Example: &lt;code>.withQuotationMark(&amp;quot;'&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="streaming-write-from-unbounded-source">Streaming write (from unbounded source)&lt;/h3>
&lt;p>It is required to create a &lt;a href="https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html">SnowPipe&lt;/a> in the Snowflake console. SnowPipe should use the same integration and the same bucket as specified by &lt;code>.withStagingBucketName&lt;/code> and &lt;code>.withStorageIntegrationName&lt;/code> methods. The write operation might look as follows:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>data.apply(
SnowflakeIO.&amp;lt;type&amp;gt;write()
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withDataSourceConfiguration(dc)
.withUserDataMapper(mapper)
.withSnowPipe(&amp;#34;MY_SNOW_PIPE&amp;#34;)
.withFlushTimeLimit(Duration.millis(time))
.withFlushRowLimit(rowsNumber)
.withShardsNumber(shardsNumber)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h4 id="parameters">Parameters&lt;/h4>
&lt;p>&lt;strong>Required&lt;/strong> for streaming:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code> .withDataSourceConfiguration()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a DatasourceConfiguration object.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.to()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts the target Snowflake table name.&lt;/li>
&lt;li>Example: &lt;code>.to(&amp;quot;MY_TABLE&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStagingBucketName()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a cloud bucket path ended with slash.&lt;/li>
&lt;li>Example: &lt;code>.withStagingBucketName(&amp;quot;{gs,s3}://bucket/my/dir/&amp;quot;)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStorageIntegrationName()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a name of a Snowflake storage integration object created according to Snowflake documentation.&lt;/li>
&lt;li>Example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE OR REPLACE STORAGE INTEGRATION &amp;#34;test_integration&amp;#34;
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;gcs://bucket/&amp;#39;);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE STORAGE INTEGRATION &amp;#34;test_integration&amp;#34;
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = &amp;#39;&amp;lt;ARN ROLE NAME&amp;gt;&amp;#39;
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;s3://bucket/&amp;#39;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Then:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>.withStorageIntegrationName(&amp;#34;test_integration&amp;#34;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withSnowPipe()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Accepts the target SnowPipe name. &lt;code>.withSnowPipe()&lt;/code> accepts the exact name of snowpipe.
Example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE OR REPLACE PIPE &amp;#34;test_database&amp;#34;.&amp;#34;public&amp;#34;.&amp;#34;test_gcs_pipe&amp;#34;
AS COPY INTO stream_table from @streamstage;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>.withSnowPipe(&amp;#34;test_gcs_pipe&amp;#34;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong>: this is important to provide &lt;strong>schema&lt;/strong> and &lt;strong>database&lt;/strong> names.&lt;/p>
&lt;ul>
&lt;li>&lt;code>.withUserDataMapper()&lt;/code>
&lt;ul>
&lt;li>Accepts the &lt;a href="/documentation/io/built-in/snowflake/#userdatamapper-function">UserDataMapper&lt;/a> function that will map a user&amp;rsquo;s PCollection to an array of String values &lt;code>(String[]).&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong>:&lt;/p>
&lt;p>As mentioned before SnowflakeIO uses &lt;a href="https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html">SnowPipe REST calls&lt;/a>
behind the scenes for writing from unbounded sources. StagingBucketName will be used to save CSV files which will end up in Snowflake.
SnowflakeIO is not going to delete created CSV files from path under the “stagingBucketName” either during or after finishing streaming.&lt;/p>
&lt;p>&lt;strong>Optional&lt;/strong> for streaming:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>.withFlushTimeLimit()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Default value: 30 seconds&lt;/li>
&lt;li>Accepts Duration objects with the specified time after each the streaming write will be repeated&lt;/li>
&lt;li>Example: &lt;code>.withFlushTimeLimit(Duration.millis(180000))&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withFlushRowLimit()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Default value: 10,000 rows&lt;/li>
&lt;li>Limit of rows written to each staged file&lt;/li>
&lt;li>Example: &lt;code>.withFlushRowLimit(500000)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withShardNumber()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Default value: 1 shard&lt;/li>
&lt;li>Number of files that will be saved in every flush (for purposes of parallel write).&lt;/li>
&lt;li>Example: &lt;code>.withShardNumber(5)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withQuotationMark()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Default value: &lt;code>‘&lt;/code> (single quotation mark).&lt;/li>
&lt;li>Accepts String with one character. It will surround all text (String) fields saved to CSV. It should be one of the accepted characters by &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html">Snowflake’s&lt;/a> &lt;a href="https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html">FIELD_OPTIONALLY_ENCLOSED_BY&lt;/a> parameter (double quotation mark, single quotation mark or none). Example: .withQuotationMark(&amp;quot;&amp;quot;) (no quotation marks)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withDebugMode()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts:
&lt;ul>
&lt;li>&lt;code>SnowflakeIO.StreamingLogLevel.INFO&lt;/code> - shows whole info about loaded files&lt;/li>
&lt;li>&lt;code>SnowflakeIO.StreamingLogLevel.ERROR&lt;/code> - shows only errors.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Shows logs about streamed files to Snowflake similarly to &lt;a href="https://docs.snowflake.com/en/user-guide/data-load-snowpipe-rest-apis.html#endpoint-insertreport">insertReport&lt;/a>. Enabling debug mode may influence performance.&lt;/li>
&lt;li>Example: &lt;code>.withDebugMode(SnowflakeIO.StreamingLogLevel.INFO)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Important notice&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Streaming accepts only &lt;strong>key pair authentication&lt;/strong>. For details, see: &lt;a href="https://github.com/apache/beam/issues/21287">Issue 21287&lt;/a>.&lt;/li>
&lt;li>The role parameter configured in &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> object is ignored for streaming writing. For details, see: &lt;a href="https://github.com/apache/beam/issues/21365">Issue 21365&lt;/a>&lt;/li>
&lt;/ol>
&lt;h4 id="flush-time-duration--number-of-rows">Flush time: duration &amp;amp; number of rows&lt;/h4>
&lt;p>Duration: streaming write will write periodically files on stage according to time duration specified in flush time limit (for example. every 1 minute).&lt;/p>
&lt;p>Number of rows: files staged for write will have number of rows specified in flush row limit unless the flush time limit will be reached (for example if the limit is 1000 rows and buffer collected 99 rows and the 1-minute flush time passes, the rows will be sent to SnowPipe for insertion).&lt;/p>
&lt;p>Size of staged files will depend on the rows size and used compression (GZIP).&lt;/p>
&lt;h3 id="userdatamapper-function">UserDataMapper function&lt;/h3>
&lt;p>The &lt;code>UserDataMapper&lt;/code> function is required to map data from a &lt;code>PCollection&lt;/code> to an array of String values before the &lt;code>write()&lt;/code> operation saves the data to temporary &lt;code>.csv&lt;/code> files. For example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>public static SnowflakeIO.UserDataMapper&amp;lt;Long&amp;gt; getCsvMapper() {
return (SnowflakeIO.UserDataMapper&amp;lt;Long&amp;gt;) recordLine -&amp;gt; new String[] {recordLine.toString()};
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3 id="additional-write-options">Additional write options&lt;/h3>
&lt;h4 id="transformation-query">Transformation query&lt;/h4>
&lt;p>The &lt;code>.withQueryTransformation()&lt;/code> option for the &lt;code>write()&lt;/code> operation accepts a SQL query as a String value, which will be performed while transfering data staged in CSV files directly to the target Snowflake table. For information about the transformation SQL syntax, see the &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html#transformation-parameters">Snowflake Documentation&lt;/a>.&lt;/p>
&lt;p>Usage:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>String query = &amp;#34;SELECT t.$1 from YOUR_TABLE;&amp;#34;;
data.apply(
SnowflakeIO.&amp;lt;~&amp;gt;write()
.withDataSourceConfiguration(dc)
.to(&amp;#34;MY_TABLE&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withUserDataMapper(mapper)
.withQueryTransformation(query)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h4 id="write-disposition">Write disposition&lt;/h4>
&lt;p>Define the write behaviour based on the table where data will be written to by specifying the &lt;code>.withWriteDisposition(...)&lt;/code> option for the &lt;code>write()&lt;/code> operation. The following values are supported:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>APPEND&lt;/code> - Default behaviour. Written data is added to the existing rows in the table,&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>EMPTY&lt;/code> - The target table must be empty; otherwise, the write operation fails,&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>TRUNCATE&lt;/code> - The write operation deletes all rows from the target table before writing to it.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Example of usage:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>data.apply(
SnowflakeIO.&amp;lt;~&amp;gt;write()
.withDataSourceConfiguration(dc)
.to(&amp;#34;MY_TABLE&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withUserDataMapper(mapper)
.withWriteDisposition(TRUNCATE)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h4 id="create-disposition">Create disposition&lt;/h4>
&lt;p>The &lt;code>.withCreateDisposition()&lt;/code> option defines the behavior of the write operation if the target table does not exist . The following values are supported:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>CREATE_IF_NEEDED&lt;/code> - default behaviour. The write operation checks whether the specified target table exists; if it does not, the write operation attempts to create the table Specify the schema for the target table using the &lt;code>.withTableSchema()&lt;/code> option.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>CREATE_NEVER&lt;/code> - The write operation fails if the target table does not exist.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Usage:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>data.apply(
SnowflakeIO.&amp;lt;~&amp;gt;write()
.withDataSourceConfiguration(dc)
.to(&amp;#34;MY_TABLE&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withUserDataMapper(mapper)
.withCreateDisposition(CREATE_NEVER)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h4 id="table-schema-disposition">Table schema disposition&lt;/h4>
&lt;p>When the &lt;code>.withCreateDisposition()&lt;/code> option is set to &lt;code>CREATE_IF_NEEDED&lt;/code>, the &lt;code>.withTableSchema()&lt;/code> option enables specifying the schema for the created target table.
A table schema is a list of &lt;code>SnowflakeColumn&lt;/code> objects with name and type corresponding to column type for each column in the table.&lt;/p>
&lt;p>Usage:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>SnowflakeTableSchema tableSchema =
new SnowflakeTableSchema(
SnowflakeColumn.of(&amp;#34;my_date&amp;#34;, new SnowflakeDate(), true),
new SnowflakeColumn(&amp;#34;id&amp;#34;, new SnowflakeNumber()),
SnowflakeColumn.of(&amp;#34;name&amp;#34;, new SnowflakeText(), true));
data.apply(
SnowflakeIO.&amp;lt;~&amp;gt;write()
.withDataSourceConfiguration(dc)
.to(&amp;#34;MY_TABLE&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withUserDataMapper(mapper)
.withTableSchema(tableSchema)
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="reading-from-snowflake">Reading from Snowflake&lt;/h2>
&lt;p>One of the functions of SnowflakeIO is reading Snowflake tables - either full tables via table name or custom data via query. Output of the read transform is a &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollection.html">PCollection&lt;/a> of user-defined data type.&lt;/p>
&lt;h3 id="general-usage-1">General usage&lt;/h3>
&lt;p>The basic &lt;code>.read()&lt;/code> operation usage:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>PCollection&amp;lt;USER_DATA_TYPE&amp;gt; items = pipeline.apply(
SnowflakeIO.&amp;lt;USER_DATA_TYPE&amp;gt;read()
.withDataSourceConfiguration(dc)
.fromTable(&amp;#34;MY_TABLE&amp;#34;) // or .fromQuery(&amp;#34;QUERY&amp;#34;)
.withStagingBucketName(&amp;#34;BUCKET&amp;#34;)
.withStorageIntegrationName(&amp;#34;STORAGE INTEGRATION NAME&amp;#34;)
.withCsvMapper(mapper)
.withCoder(coder));
)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Where all below parameters are required:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>.withDataSourceConfiguration(...)&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a DataSourceConfiguration object.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.fromTable(...) or .fromQuery(...)&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Specifies a Snowflake table name or custom SQL query.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStagingBucketName()&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a cloud bucket name.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withStorageIntegrationName()&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Accepts a name of a Snowflake storage integration object created according to Snowflake documentation. Example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE OR REPLACE STORAGE INTEGRATION test_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;gcs://bucket/&amp;#39;);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>CREATE STORAGE INTEGRATION test_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = &amp;#39;&amp;lt;ARN ROLE NAME&amp;gt;&amp;#39;
STORAGE_ALLOWED_LOCATIONS = (&amp;#39;s3://bucket/&amp;#39;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
Then:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>.withStorageIntegrationName(test_integration)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withCsvMapper(mapper)&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts a &lt;a href="/documentation/io/built-in/snowflake/#csvmapper">CSVMapper&lt;/a> instance for mapping String[] to USER_DATA_TYPE.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>.withCoder(coder)&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Accepts the &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/coders/Coder.html">Coder&lt;/a> for USER_DATA_TYPE.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Note&lt;/strong>:
SnowflakeIO uses &lt;code>COPY&lt;/code> statements behind the scenes to read (using &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY to location&lt;/a>) files staged in cloud storage.StagingBucketName will be used as a temporary location for storing CSV files. Those temporary directories will be named &lt;code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX&lt;/code> and they will be removed automatically once Read operation finishes.&lt;/p>
&lt;h3 id="csvmapper">CSVMapper&lt;/h3>
&lt;p>SnowflakeIO uses a &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY INTO &lt;location>&lt;/a> statement to move data from a Snowflake table to GCS/S3 as CSV files. These files are then downloaded via &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileIO&lt;/a> and processed line by line. Each line is split into an array of Strings using the &lt;a href="https://opencsv.sourceforge.net/">OpenCSV&lt;/a> library.&lt;/p>
&lt;p>The CSVMapper’s job is to give the user the possibility to convert the array of Strings to a user-defined type, ie. GenericRecord for Avro or Parquet files, or custom POJO.&lt;/p>
&lt;p>Example implementation of CsvMapper for GenericRecord:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>static SnowflakeIO.CsvMapper&amp;lt;GenericRecord&amp;gt; getCsvMapper() {
return (SnowflakeIO.CsvMapper&amp;lt;GenericRecord&amp;gt;)
parts -&amp;gt; {
return new GenericRecordBuilder(PARQUET_SCHEMA)
.set(&amp;#34;ID&amp;#34;, Long.valueOf(parts[0]))
.set(&amp;#34;NAME&amp;#34;, parts[1])
[...]
.build();
};
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="using-snowflakeio-with-aws-s3">Using SnowflakeIO with AWS S3&lt;/h2>
&lt;p>To be able to use AWS S3 bucket as &lt;code>stagingBucketName&lt;/code> is required to:&lt;/p>
&lt;ol>
&lt;li>Create &lt;code>PipelineOptions&lt;/code> interface which is &lt;a href="/documentation/io/built-in/snowflake/#extending-pipeline-options">extending&lt;/a> &lt;code>SnowflakePipelineOptions&lt;/code> and &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/aws2/options/S3Options.html">S3Options&lt;/a>
with &lt;code>AwsAccessKey&lt;/code> and &lt;code>AwsSecretKey&lt;/code> options. Example:&lt;/li>
&lt;/ol>
&lt;p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>public interface AwsPipelineOptions extends SnowflakePipelineOptions, S3Options {
@Description(&amp;#34;AWS Access Key&amp;#34;)
@Default.String(&amp;#34;access_key&amp;#34;)
String getAwsAccessKey();
void setAwsAccessKey(String awsAccessKey);
@Description(&amp;#34;AWS secret key&amp;#34;)
@Default.String(&amp;#34;secret_key&amp;#34;)
String getAwsSecretKey();
void setAwsSecretKey(String awsSecretKey);
}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
2. Set &lt;code>AwsCredentialsProvider&lt;/code> option by using &lt;code>AwsAccessKey&lt;/code> and &lt;code>AwsSecretKey&lt;/code> options.&lt;/p>
&lt;p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>options.setAwsCredentialsProvider(
new AWSStaticCredentialsProvider(
new BasicAWSCredentials(options.getAwsAccessKey(), options.getAwsSecretKey())
)
);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
3. Create pipeline&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Pipeline p = Pipeline.create(options);&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Note&lt;/strong>: Remember to set &lt;code>awsRegion&lt;/code> from &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/aws2/options/S3Options.html">S3Options&lt;/a>.&lt;/p>
&lt;h2 id="using-snowflakeio-in-python-sdk">Using SnowflakeIO in Python SDK&lt;/h2>
&lt;h3 id="intro">Intro&lt;/h3>
&lt;p>Snowflake cross-language implementation is supporting both reading and writing operations for Python programming language, thanks to
cross-language which is part of &lt;a href="/roadmap/portability/">Portability Framework Roadmap&lt;/a> which aims to provide full interoperability
across the Beam ecosystem. From a developer perspective it means the possibility of combining transforms written in different languages(Java/Python/Go).&lt;/p>
&lt;p>For more information about cross-language please see &lt;a href="/roadmap/connectors-multi-sdk/">multi sdk efforts&lt;/a>
and &lt;a href="/roadmap/connectors-multi-sdk/#cross-language-transforms-api-and-expansion-service">Cross-language transforms API and expansion service&lt;/a> articles.&lt;/p>
&lt;p>Additional resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/snowflake.py">SnowflakeIO source code&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.io.snowflake.html">SnowflakeIO Pydoc&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.snowflake.com/en">Snowflake documentation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="reading-from-snowflake-1">Reading from Snowflake&lt;/h3>
&lt;p>One of the functions of SnowflakeIO is reading Snowflake tables - either full tables via table name or custom data via query. Output of the read transform is a &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.PCollection">PCollection&lt;/a> of user-defined data type.&lt;/p>
&lt;h4 id="general-usage-2">General usage&lt;/h4>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>OPTIONS = [&amp;#34;--runner=FlinkRunner&amp;#34;]
with TestPipeline(options=PipelineOptions(OPTIONS)) as p:
(p
| ReadFromSnowflake(...)
| &amp;lt;FURTHER TRANSFORMS&amp;gt;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h4 id="required-parameters">Required parameters&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;code>server_name&lt;/code> Full Snowflake server name with an account, zone, and domain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>schema&lt;/code> Name of the Snowflake schema in the database to use.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>database&lt;/code> Name of the Snowflake database to use.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>staging_bucket_name&lt;/code> Name of the Google Cloud Storage bucket or AWS S3 bucket. Bucket will be used as a temporary location for storing CSV files. Those temporary directories will be named &lt;code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX&lt;/code> and they will be removed automatically once Read operation finishes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>storage_integration_name&lt;/code> Is the name of a Snowflake storage integration object created according to &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html">Snowflake documentation&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>csv_mapper&lt;/code> Specifies a function which must translate user-defined object to array of strings. SnowflakeIO uses a &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY INTO &lt;location>&lt;/a> statement to move data from a Snowflake table to GCS/S3 as CSV files. These files are then downloaded via &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileIO&lt;/a> and processed line by line. Each line is split into an array of Strings using the &lt;a href="https://opencsv.sourceforge.net/">OpenCSV&lt;/a> library. The csv_mapper function job is to give the user the possibility to convert the array of Strings to a user-defined type, ie. GenericRecord for Avro or Parquet files, or custom objects.
Example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>def csv_mapper(strings_array):
return User(strings_array[0], int(strings_array[1])))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>table&lt;/code> or &lt;code>query&lt;/code> Specifies a Snowflake table name or custom SQL query&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="authentication-parameters">Authentication parameters&lt;/h4>
&lt;p>It’s required to pass one of the following combinations of valid parameters for authentication:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>username&lt;/code> and &lt;code>password&lt;/code> Specifies username and password for username/password authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>private_key_path&lt;/code> and &lt;code>private_key_passphrase&lt;/code> Specifies a path to private key and passphrase for key/pair authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>raw_private_key&lt;/code> and &lt;code>private_key_passphrase&lt;/code> Specifies a private key and passphrase for key/pair authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>o_auth_token&lt;/code> Specifies access token for OAuth authentication method.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="additional-parameters">Additional parameters&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;code>role&lt;/code> specifies Snowflake role. If not specified the user&amp;rsquo;s default will be used.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>warehouse&lt;/code> specifies Snowflake warehouse name. If not specified the user&amp;rsquo;s default will be used.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>expansion_service&lt;/code> specifies URL of expansion service.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="writing-to-snowflake">Writing to Snowflake&lt;/h3>
&lt;p>One of the functions of SnowflakeIO is writing to Snowflake tables. This transformation enables you to finish the Beam pipeline with an output operation that sends the user&amp;rsquo;s &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.PCollection">PCollection&lt;/a> to your Snowflake database.&lt;/p>
&lt;h4 id="general-usage-3">General usage&lt;/h4>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>OPTIONS = [&amp;#34;--runner=FlinkRunner&amp;#34;]
with TestPipeline(options=PipelineOptions(OPTIONS)) as p:
(p
| &amp;lt;SOURCE OF DATA&amp;gt;
| WriteToSnowflake(
server_name=&amp;lt;SNOWFLAKE SERVER NAME&amp;gt;,
username=&amp;lt;SNOWFLAKE USERNAME&amp;gt;,
password=&amp;lt;SNOWFLAKE PASSWORD&amp;gt;,
o_auth_token=&amp;lt;OAUTH TOKEN&amp;gt;,
private_key_path=&amp;lt;PATH TO P8 FILE&amp;gt;,
raw_private_key=&amp;lt;PRIVATE_KEY&amp;gt;
private_key_passphrase=&amp;lt;PASSWORD FOR KEY&amp;gt;,
schema=&amp;lt;SNOWFLAKE SCHEMA&amp;gt;,
database=&amp;lt;SNOWFLAKE DATABASE&amp;gt;,
staging_bucket_name=&amp;lt;GCS OR S3 BUCKET&amp;gt;,
storage_integration_name=&amp;lt;SNOWFLAKE STORAGE INTEGRATION NAME&amp;gt;,
create_disposition=&amp;lt;CREATE DISPOSITION&amp;gt;,
write_disposition=&amp;lt;WRITE DISPOSITION&amp;gt;,
table_schema=&amp;lt;SNOWFLAKE TABLE SCHEMA&amp;gt;,
user_data_mapper=&amp;lt;USER DATA MAPPER FUNCTION&amp;gt;,
table=&amp;lt;SNOWFLAKE TABLE&amp;gt;,
query=&amp;lt;IF NOT TABLE THEN QUERY&amp;gt;,
role=&amp;lt;SNOWFLAKE ROLE&amp;gt;,
warehouse=&amp;lt;SNOWFLAKE WAREHOUSE&amp;gt;,
expansion_service=&amp;lt;EXPANSION SERVICE ADDRESS&amp;gt;))&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h4 id="required-parameters-1">Required parameters&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;code>server_name&lt;/code> Full Snowflake server name with account, zone and domain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>schema&lt;/code> Name of the Snowflake schema in the database to use.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>database&lt;/code> Name of the Snowflake database to use.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>staging_bucket_name&lt;/code> Path to Google Cloud Storage bucket or AWS S3 bucket ended with slash. Bucket will be used to save CSV files which will end up in Snowflake. Those CSV files will be saved under “staging_bucket_name” path.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>storage_integration_name&lt;/code> Is the name of a Snowflake storage integration object created according to &lt;a href="https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html">Snowflake documentation&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>user_data_mapper&lt;/code> Specifies a function which maps data from a PCollection to an array of String values before the write operation saves the data to temporary .csv files.
Example:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>def user_data_mapper(user):
return [user.name, str(user.age)]&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>table&lt;/code> or &lt;code>query&lt;/code> Specifies a Snowflake table name or custom SQL query&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="authentication-parameters-1">Authentication parameters&lt;/h4>
&lt;p>It’s required to pass one of the following combination of valid parameters for authentication:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>username&lt;/code> and &lt;code>password&lt;/code> Specifies username/password authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>private_key_path&lt;/code> and &lt;code>private_key_passphrase&lt;/code> Specifies a path to private key and passphrase for key/pair authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>raw_private_key&lt;/code> and &lt;code>private_key_passphrase&lt;/code> Specifies a private key and passphrase for key/pair authentication method.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>o_auth_token&lt;/code> Specifies access token for OAuth authentication method.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="additional-parameters-1">Additional parameters&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;code>role&lt;/code> specifies Snowflake role. If not specified the user&amp;rsquo;s default will be used.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>warehouse&lt;/code> specifies Snowflake warehouse name. If not specified the user&amp;rsquo;s default will be used.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>create_disposition&lt;/code> Defines the behaviour of the write operation if the target table does not exist. The following values are supported:&lt;/p>
&lt;ul>
&lt;li>&lt;code>CREATE_IF_NEEDED&lt;/code> - default behaviour. The write operation checks whether the specified target table exists; if it does not, the write operation attempts to create the table Specify the schema for the target table using the table_schema parameter.&lt;/li>
&lt;li>&lt;code>CREATE_NEVER&lt;/code> - The write operation fails if the target table does not exist.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>write_disposition&lt;/code> Defines the write behaviour based on the table where data will be written to. The following values are supported:&lt;/p>
&lt;ul>
&lt;li>&lt;code>APPEND&lt;/code> - Default behaviour. Written data is added to the existing rows in the table,&lt;/li>
&lt;li>&lt;code>EMPTY&lt;/code> - The target table must be empty; otherwise, the write operation fails,&lt;/li>
&lt;li>&lt;code>TRUNCATE&lt;/code> - The write operation deletes all rows from the target table before writing to it.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>table_schema&lt;/code> When the &lt;code>create_disposition&lt;/code> parameter is set to CREATE_IF_NEEDED, the table_schema parameter enables specifying the schema for the created target table. A table schema is a JSON array with the following structure:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>{&amp;#34;schema&amp;#34;: [
{
&amp;#34;dataType&amp;#34;:{&amp;#34;type&amp;#34;:&amp;#34;&amp;lt;COLUMN DATA TYPE&amp;gt;&amp;#34;},
&amp;#34;name&amp;#34;:&amp;#34;&amp;lt;COLUMN NAME&amp;gt; &amp;#34;,
&amp;#34;nullable&amp;#34;: &amp;lt;NULLABLE&amp;gt;
},
...
]}&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
All supported data types:
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>{&amp;#34;type&amp;#34;:&amp;#34;date&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;datetime&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;time&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;timestamp&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;timestamp_ltz&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;timestamp_ntz&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;timestamp_tz&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;boolean&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;decimal&amp;#34;,&amp;#34;precision&amp;#34;:38,&amp;#34;scale&amp;#34;:1},
{&amp;#34;type&amp;#34;:&amp;#34;double&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;float&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;integer&amp;#34;,&amp;#34;precision&amp;#34;:38,&amp;#34;scale&amp;#34;:0},
{&amp;#34;type&amp;#34;:&amp;#34;number&amp;#34;,&amp;#34;precision&amp;#34;:38,&amp;#34;scale&amp;#34;:1},
{&amp;#34;type&amp;#34;:&amp;#34;numeric&amp;#34;,&amp;#34;precision&amp;#34;:38,&amp;#34;scale&amp;#34;:2},
{&amp;#34;type&amp;#34;:&amp;#34;real&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;array&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;object&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;variant&amp;#34;},
{&amp;#34;type&amp;#34;:&amp;#34;binary&amp;#34;,&amp;#34;size&amp;#34;:null},
{&amp;#34;type&amp;#34;:&amp;#34;char&amp;#34;,&amp;#34;length&amp;#34;:1},
{&amp;#34;type&amp;#34;:&amp;#34;string&amp;#34;,&amp;#34;length&amp;#34;:null},
{&amp;#34;type&amp;#34;:&amp;#34;text&amp;#34;,&amp;#34;length&amp;#34;:null},
{&amp;#34;type&amp;#34;:&amp;#34;varbinary&amp;#34;,&amp;#34;size&amp;#34;:null},
{&amp;#34;type&amp;#34;:&amp;#34;varchar&amp;#34;,&amp;#34;length&amp;#34;:100}]&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
You can read about Snowflake data types at &lt;a href="https://docs.snowflake.com/en/sql-reference/data-types.html">Snowflake data types&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>expansion_service&lt;/code> Specifies URL of expansion service.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="limitations">Limitations&lt;/h2>
&lt;p>SnowflakeIO currently has the following limitations.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Streaming writing supports only pair key authentication. For details, see: &lt;a href="https://github.com/apache/beam/issues/21287">Issue 21287&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The role parameter configured in &lt;code>SnowflakeIO.DataSourceConfiguration&lt;/code> object is ignored for streaming writing. For details, see: &lt;a href="https://github.com/apache/beam/issues/21365">Issue 21365&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title>Documentation: ApproximateQuantiles</title><link>/documentation/transforms/java/aggregation/approximatequantiles/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/approximatequantiles/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="approximatequantiles">ApproximateQuantiles&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateQuantiles.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Takes a comparison function and the desired number of quantiles &lt;em>n&lt;/em>, either
globally or per-key. Using an approximation algorithm, it returns the
minimum value, &lt;em>n-2&lt;/em> intermediate values, and the maximum value.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example&lt;/strong>: to compute the quartiles of a &lt;code>PCollection&lt;/code> of integers, we
would use &lt;code>ApproximateQuantiles.globally(5)&lt;/code>. This will produce a list
containing 5 values: the minimum value, Quartile 1 value, Quartile 2
value, Quartile 3 value, and the maximum value.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_ApproximateQuantiles"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_ApproximateQuantiles%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/approximateunique">ApproximateUnique&lt;/a>
estimates the number of distinct elements or distinct values in key-value pairs&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/combine">Combine&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Documentation: ApproximateQuantiles</title><link>/documentation/transforms/python/aggregation/approximatequantiles/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/approximatequantiles/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="approximatequantiles">ApproximateQuantiles&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.stat.html#apache_beam.transforms.stat.ApproximateQuantile"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_ApproximateQuantiles"
data-show="approximatequantiles"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_ApproximateQuantiles%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22approximatequantiles%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2></description></item><item><title>Documentation: ApproximateUnique</title><link>/documentation/transforms/java/aggregation/approximateunique/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/approximateunique/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="approximateunique">ApproximateUnique&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateUnique.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Transforms for estimating the number of distinct elements in a collection
or the number of distinct values associated with each key in a collection
of key-value pairs.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>See &lt;a href="https://issues.apache.org/jira/browse/BEAM-7703">BEAM-7703&lt;/a> for updates.&lt;/p>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/hllcount">HllCount&lt;/a>
estimates the number of distinct elements and creates re-aggregatable sketches using the HyperLogLog++ algorithm.&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/count">Count&lt;/a>
counts the number of elements within each aggregation.&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/distinct">Distinct&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Documentation: ApproximateUnique</title><link>/documentation/transforms/python/aggregation/approximateunique/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/approximateunique/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="approximateunique">ApproximateUnique&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.stat.html#apache_beam.transforms.stat.ApproximateUnique"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_ApproximateUnique"
data-show="approximateunique"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_ApproximateUnique%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22approximateunique%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2></description></item><item><title>Documentation: Auto Update ML models using WatchFilePattern</title><link>/documentation/ml/side-input-updates/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/ml/side-input-updates/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="use-watchfilepattern-to-auto-update-ml-models-in-runinference">Use WatchFilePattern to auto-update ML models in RunInference&lt;/h1>
&lt;p>The pipeline in this example uses a &lt;a href="https://beam.apache.org/documentation/transforms/python/elementwise/runinference/">RunInference&lt;/a> &lt;code>PTransform&lt;/code> to run inference on images using TensorFlow models. It uses a &lt;a href="https://beam.apache.org/documentation/programming-guide/#side-inputs">side input&lt;/a> &lt;code>PCollection&lt;/code> that emits &lt;code>ModelMetadata&lt;/code> to update the model.&lt;/p>
&lt;p>Using side inputs, you can update your model (which is passed in a &lt;code>ModelHandler&lt;/code> configuration object) in real-time, even while the Beam pipeline is still running. This can be done either by leveraging one of Beam&amp;rsquo;s provided patterns, such as the &lt;code>WatchFilePattern&lt;/code>,
or by configuring a custom side input &lt;code>PCollection&lt;/code> that defines the logic for the model update.&lt;/p>
&lt;p>For more information about side inputs, see the &lt;a href="https://beam.apache.org/documentation/programming-guide/#side-inputs">Side inputs&lt;/a> section in the Apache Beam Programming Guide.&lt;/p>
&lt;p>This example uses &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.utils.html#apache_beam.ml.inference.utils.WatchFilePattern">&lt;code>WatchFilePattern&lt;/code>&lt;/a> as a side input. &lt;code>WatchFilePattern&lt;/code> is used to watch for the file updates matching the &lt;code>file_pattern&lt;/code>
based on timestamps. It emits the latest &lt;a href="https://beam.apache.org/documentation/transforms/python/elementwise/runinference/">&lt;code>ModelMetadata&lt;/code>&lt;/a>, which is used in
the RunInference &lt;code>PTransform&lt;/code> to automatically update the ML model without stopping the Beam pipeline.&lt;/p>
&lt;h2 id="set-up-the-source">Set up the source&lt;/h2>
&lt;p>To read the image names, use a Pub/Sub topic as the source. The Pub/Sub topic emits a &lt;code>UTF-8&lt;/code> encoded model path that is used to read and preprocess images to run the inference.&lt;/p>
&lt;h2 id="models-for-image-segmentation">Models for image segmentation&lt;/h2>
&lt;p>For the purpose of this example, use TensorFlow models saved in &lt;a href="https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format">HDF5&lt;/a> format.&lt;/p>
&lt;h2 id="pre-process-images-for-inference">Pre-process images for inference&lt;/h2>
&lt;p>The Pub/Sub topic emits an image path. We need to read and preprocess the image to use it for RunInference. The &lt;code>read_image&lt;/code> function is used to read the image for inference.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">io&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">PIL&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Image&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.io.filesystems&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">FileSystems&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">numpy&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">tensorflow&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">tf&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">read_image&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">image_file_name&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">FileSystems&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">image_file_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;r&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">file&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Image&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BytesIO&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()))&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">convert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;RGB&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">data&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">resize&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="mi">224&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">224&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numpy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">array&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mf">255.0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_tensor&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">cast&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">convert_to_tensor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">...&lt;/span>&lt;span class="p">]),&lt;/span> &lt;span class="n">dtype&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">float32&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">img_tensor&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, let&amp;rsquo;s jump into the pipeline code.&lt;/p>
&lt;p>&lt;strong>Pipeline steps&lt;/strong>:&lt;/p>
&lt;ol>
&lt;li>Get the image names from the Pub/Sub topic.&lt;/li>
&lt;li>Read and pre-process the images using the &lt;code>read_image&lt;/code> function.&lt;/li>
&lt;li>Pass the images to the RunInference &lt;code>PTransform&lt;/code>. RunInference takes &lt;code>model_handler&lt;/code> and &lt;code>model_metadata_pcoll&lt;/code> as input parameters.&lt;/li>
&lt;/ol>
&lt;p>For the &lt;a href="https://github.com/apache/beam/blob/07f52a478174f8733c7efedb7189955142faa5fa/sdks/python/apache_beam/ml/inference/base.py#L308">&lt;code>model_handler&lt;/code>&lt;/a>, we use &lt;a href="https://github.com/apache/beam/blob/186973b110d82838fb8e5ba27f0225a67c336591/sdks/python/apache_beam/ml/inference/tensorflow_inference.py#L184">TFModelHandlerTensor&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.ml.inference.tensorflow_inference&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">TFModelHandlerTensor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># initialize TFModelHandlerTensor with a .h5 model saved in a directory accessible by the pipeline.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">tf_model_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TFModelHandlerTensor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model_uri&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://&amp;lt;your-bucket&amp;gt;/&amp;lt;model_path.h5&amp;gt;&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>model_metadata_pcoll&lt;/code> is a &lt;a href="https://beam.apache.org/documentation/programming-guide/#side-inputs">side input&lt;/a> &lt;code>PCollection&lt;/code> to the RunInference &lt;code>PTransform&lt;/code>. This side input is used to update the models in the &lt;code>model_handler&lt;/code> without needing to stop the beam pipeline.
We will use &lt;code>WatchFilePattern&lt;/code> as side input to watch a glob pattern matching &lt;code>.h5&lt;/code> files.&lt;/p>
&lt;p>&lt;code>model_metadata_pcoll&lt;/code> expects a &lt;code>PCollection&lt;/code> of ModelMetadata compatible with &lt;a href="https://beam.apache.org/releases/pydoc/2.4.0/apache_beam.pvalue.html#apache_beam.pvalue.AsSingleton">AsSingleton&lt;/a>. Because the pipeline uses &lt;code>WatchFilePattern&lt;/code> as side input, it will take care of windowing and wrapping the output into &lt;code>ModelMetadata&lt;/code>.&lt;/p>
&lt;p>After the pipeline starts processing data and when you see some outputs emitted from the RunInference &lt;code>PTransform&lt;/code>, upload a &lt;code>.h5&lt;/code> &lt;code>TensorFlow&lt;/code> model that matches the &lt;code>file_pattern&lt;/code> to the Google Cloud Storage bucket. RunInference will update the &lt;code>model_uri&lt;/code> of &lt;code>TFModelHandlerTensor&lt;/code> using &lt;code>WatchFilePattern&lt;/code> as a side input.&lt;/p>
&lt;p>&lt;strong>Note&lt;/strong>: Side input update frequency is non-deterministic and can have longer intervals between updates.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.ml.inference.utils&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">WatchFilePattern&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.ml.inference.base&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">RunInference&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">file_pattern&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;gs://&amp;lt;your-bucket&amp;gt;/*.h5&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pubsub_topic&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;&amp;lt;topic_emitting_image_names&amp;gt;&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">side_input_pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;FilePatternUpdates&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">WatchFilePattern&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_pattern&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">file_pattern&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">images_pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;ReadFromPubSub&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromPubSub&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">topic&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">pubsub_topic&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;DecodeBytes&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;utf-8&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;PreProcessImage&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">read_image&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">inference_pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">images_pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;RunInference&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">RunInference&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_handler&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf_model_handler&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_metadata_pcoll&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">side_input_pcoll&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="post-process-the-predictionresult-object">Post-process the &lt;code>PredictionResult&lt;/code> object&lt;/h2>
&lt;p>When the inference is complete, RunInference outputs a &lt;code>PredictionResult&lt;/code> object that contains &lt;code>example&lt;/code>, &lt;code>inference&lt;/code>, and &lt;code>model_id&lt;/code> fields. The &lt;code>model_id&lt;/code> is used to identify which model is used for running the inference.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.ml.inference.base&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PredictionResult&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">PostProcessor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Process the PredictionResult to get the predicted label and model id used for inference.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">PredictionResult&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Iterable&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">predicted_class&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numpy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">argmax&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">inference&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">axis&lt;/span>&lt;span class="o">=-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">labels_path&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">keras&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_file&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;ImageNetLabels.txt&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">imagenet_labels&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numpy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">array&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">labels_path&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">splitlines&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">predicted_class_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">imagenet_labels&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">predicted_class&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">predicted_class_name&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">title&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">model_id&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">post_processor_pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">inference_pcoll&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;PostProcessor&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">PostProcessor&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="run-the-pipeline">Run the pipeline&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">run&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">wait_until_finish&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Note&lt;/strong>: The &lt;code>model_name&lt;/code> of the &lt;code>ModelMetaData&lt;/code> object will be attached as prefix to the &lt;a href="https://beam.apache.org/documentation/ml/runinference-metrics/">metrics&lt;/a> calculated by the RunInference &lt;code>PTransform&lt;/code>.&lt;/p>
&lt;h2 id="final-remarks">Final remarks&lt;/h2>
&lt;p>You can use this example as a pattern when using side inputs with the RunInference &lt;code>PTransform&lt;/code> to auto-update the models without stopping the pipeline. You can see a similar example for PyTorch on &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification_with_side_inputs.py">GitHub&lt;/a>.&lt;/p></description></item><item><title>Documentation: Basics of the Beam model</title><link>/documentation/basics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/basics/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="basics-of-the-beam-model">Basics of the Beam model&lt;/h1>
&lt;p>Apache Beam is a unified model for defining both batch and streaming
data-parallel processing pipelines. To get started with Beam, you&amp;rsquo;ll need to
understand an important set of core concepts:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#pipeline">&lt;em>Pipeline&lt;/em>&lt;/a> - A pipeline is a user-constructed graph of
transformations that defines the desired data processing operations.&lt;/li>
&lt;li>&lt;a href="#pcollection">&lt;em>PCollection&lt;/em>&lt;/a> - A &lt;code>PCollection&lt;/code> is a data set or data
stream. The data that a pipeline processes is part of a PCollection.&lt;/li>
&lt;li>&lt;a href="#ptransform">&lt;em>PTransform&lt;/em>&lt;/a> - A &lt;code>PTransform&lt;/code> (or &lt;em>transform&lt;/em>) represents a
data processing operation, or a step, in your pipeline. A transform is
applied to zero or more &lt;code>PCollection&lt;/code> objects, and produces zero or more
&lt;code>PCollection&lt;/code> objects.&lt;/li>
&lt;li>&lt;a href="#aggregation">&lt;em>Aggregation&lt;/em>&lt;/a> - Aggregation is computing a value from
multiple (1 or more) input elements.&lt;/li>
&lt;li>&lt;a href="#user-defined-function-udf">&lt;em>User-defined function (UDF)&lt;/em>&lt;/a> - Some Beam
operations allow you to run user-defined code as a way to configure the
transform.&lt;/li>
&lt;li>&lt;a href="#schema">&lt;em>Schema&lt;/em>&lt;/a> - A schema is a language-independent type definition for
a &lt;code>PCollection&lt;/code>. The schema for a &lt;code>PCollection&lt;/code> defines elements of that
&lt;code>PCollection&lt;/code> as an ordered list of named fields.&lt;/li>
&lt;li>&lt;a href="/documentation/sdks/java/">&lt;em>SDK&lt;/em>&lt;/a> - A language-specific library that lets
pipeline authors build transforms, construct their pipelines, and submit
them to a runner.&lt;/li>
&lt;li>&lt;a href="#runner">&lt;em>Runner&lt;/em>&lt;/a> - A runner runs a Beam pipeline using the capabilities of
your chosen data processing engine.&lt;/li>
&lt;li>&lt;a href="#window">&lt;em>Window&lt;/em>&lt;/a> - A &lt;code>PCollection&lt;/code> can be subdivided into windows based on
the timestamps of the individual elements. Windows enable grouping operations
over collections that grow over time by dividing the collection into windows
of finite collections.&lt;/li>
&lt;li>&lt;a href="#watermark">&lt;em>Watermark&lt;/em>&lt;/a> - A watermark is a guess as to when all data in a
certain window is expected to have arrived. This is needed because data isn’t
always guaranteed to arrive in a pipeline in time order, or to always arrive
at predictable intervals.&lt;/li>
&lt;li>&lt;a href="#trigger">&lt;em>Trigger&lt;/em>&lt;/a> - A trigger determines when to aggregate the results of
each window.&lt;/li>
&lt;li>&lt;a href="#state-and-timers">&lt;em>State and timers&lt;/em>&lt;/a> - Per-key state and timer callbacks
are lower level primitives that give you full control over aggregating input
collections that grow over time.&lt;/li>
&lt;li>&lt;a href="#splittable-dofn">&lt;em>Splittable DoFn&lt;/em>&lt;/a> - Splittable DoFns let you process
elements in a non-monolithic way. You can checkpoint the processing of an
element, and the runner can split the remaining work to yield additional
parallelism.&lt;/li>
&lt;/ul>
&lt;p>The following sections cover these concepts in more detail and provide links to
additional documentation.&lt;/p>
&lt;h2 id="pipeline">Pipeline&lt;/h2>
&lt;p>A Beam pipeline is a graph (specifically, a
&lt;a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph&lt;/a>)
of all the data and computations in your data processing task. This includes
reading input data, transforming that data, and writing output data. A pipeline
is constructed by a user in their SDK of choice. Then, the pipeline makes its
way to the runner either through the SDK directly or through the Runner API&amp;rsquo;s
RPC interface. For example, this diagram shows a branching pipeline:&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-multiple-pcollections.svg" alt="The pipeline applies two transforms to a single input collection. Eachtransform produces an output collection.">&lt;/p>
&lt;p>In this diagram, the boxes represent the parallel computations called
&lt;a href="#ptransform">&lt;em>PTransforms&lt;/em>&lt;/a> and the arrows with the circles represent the data
(in the form of &lt;a href="#pcollection">&lt;em>PCollections&lt;/em>&lt;/a>) that flows between the
transforms. The data might be bounded, stored, data sets, or the data might also
be unbounded streams of data. In Beam, most transforms apply equally to bounded
and unbounded data.&lt;/p>
&lt;p>You can express almost any computation that you can think of as a graph as a
Beam pipeline. A Beam driver program typically starts by creating a &lt;code>Pipeline&lt;/code>
object, and then uses that object as the basis for creating the pipeline’s data
sets and its transforms.&lt;/p>
&lt;p>For more information about pipelines, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#overview">Beam Programming Guide: Overview&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#creating-a-pipeline">Beam Programming Guide: Creating a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/design-your-pipeline">Design your pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/create-your-pipeline">Create your pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="pcollection">PCollection&lt;/h2>
&lt;p>A &lt;code>PCollection&lt;/code> is an unordered bag of elements. Each &lt;code>PCollection&lt;/code> is a
potentially distributed, homogeneous data set or data stream, and is owned by
the specific &lt;code>Pipeline&lt;/code> object for which it is created. Multiple pipelines
cannot share a &lt;code>PCollection&lt;/code>. Beam pipelines process PCollections, and the
runner is responsible for storing these elements.&lt;/p>
&lt;p>A &lt;code>PCollection&lt;/code> generally contains &amp;ldquo;big data&amp;rdquo; (too much data to fit in memory on
a single machine). Sometimes a small sample of data or an intermediate result
might fit into memory on a single machine, but Beam&amp;rsquo;s computational patterns and
transforms are focused on situations where distributed data-parallel computation
is required. Therefore, the elements of a &lt;code>PCollection&lt;/code> cannot be processed
individually, and are instead processed uniformly in parallel.&lt;/p>
&lt;p>The following characteristics of a &lt;code>PCollection&lt;/code> are important to know.&lt;/p>
&lt;p>&lt;strong>Bounded vs. unbounded&lt;/strong>:&lt;/p>
&lt;p>A &lt;code>PCollection&lt;/code> can be either bounded or unbounded.&lt;/p>
&lt;ul>
&lt;li>A &lt;em>bounded&lt;/em> &lt;code>PCollection&lt;/code> is a dataset of a known, fixed size (alternatively,
a dataset that is not growing over time). Bounded data can be processed by
batch pipelines.&lt;/li>
&lt;li>An &lt;em>unbounded&lt;/em> &lt;code>PCollection&lt;/code> is a dataset that grows over time, and the
elements are processed as they arrive. Unbounded data must be processed by
streaming pipelines.&lt;/li>
&lt;/ul>
&lt;p>These two categories derive from the intuitions of batch and stream processing,
but the two are unified in Beam and bounded and unbounded PCollections can
coexist in the same pipeline. If your runner can only support bounded
PCollections, you must reject pipelines that contain unbounded PCollections. If
your runner is only targeting streams, there are adapters in Beam&amp;rsquo;s support code
to convert everything to APIs that target unbounded data.&lt;/p>
&lt;p>&lt;strong>Timestamps&lt;/strong>:&lt;/p>
&lt;p>Every element in a &lt;code>PCollection&lt;/code> has a timestamp associated with it.&lt;/p>
&lt;p>When you execute a primitive connector to a storage system, that connector is
responsible for providing initial timestamps. The runner must propagate and
aggregate timestamps. If the timestamp is not important, such as with certain
batch processing jobs where elements do not denote events, the timestamp will be
the minimum representable timestamp, often referred to colloquially as &amp;ldquo;negative
infinity&amp;rdquo;.&lt;/p>
&lt;p>&lt;strong>Watermarks&lt;/strong>:&lt;/p>
&lt;p>Every &lt;code>PCollection&lt;/code> must have a &lt;a href="#watermark">watermark&lt;/a> that estimates how
complete the &lt;code>PCollection&lt;/code> is.&lt;/p>
&lt;p>The watermark is a guess that &amp;ldquo;we&amp;rsquo;ll never see an element with an earlier
timestamp&amp;rdquo;. Data sources are responsible for producing a watermark. The runner
must implement watermark propagation as PCollections are processed, merged, and
partitioned.&lt;/p>
&lt;p>The contents of a &lt;code>PCollection&lt;/code> are complete when a watermark advances to
&amp;ldquo;infinity&amp;rdquo;. In this manner, you can discover that an unbounded PCollection is
finite.&lt;/p>
&lt;p>&lt;strong>Windowed elements&lt;/strong>:&lt;/p>
&lt;p>Every element in a &lt;code>PCollection&lt;/code> resides in a &lt;a href="#window">window&lt;/a>. No element
resides in multiple windows; two elements can be equal except for their window,
but they are not the same.&lt;/p>
&lt;p>When elements are written to the outside world, they are effectively placed back
into the global window. Transforms that write data and don&amp;rsquo;t take this
perspective risk data loss.&lt;/p>
&lt;p>A window has a maximum timestamp. When the watermark exceeds the maximum
timestamp plus the user-specified allowed lateness, the window is expired. All
data related to an expired window might be discarded at any time.&lt;/p>
&lt;p>&lt;strong>Coder&lt;/strong>:&lt;/p>
&lt;p>Every &lt;code>PCollection&lt;/code> has a coder, which is a specification of the binary format
of the elements.&lt;/p>
&lt;p>In Beam, the user&amp;rsquo;s pipeline can be written in a language other than the
language of the runner. There is no expectation that the runner can actually
deserialize user data. The Beam model operates principally on encoded data,
&amp;ldquo;just bytes&amp;rdquo;. Each &lt;code>PCollection&lt;/code> has a declared encoding for its elements,
called a coder. A coder has a URN that identifies the encoding, and might have
additional sub-coders. For example, a coder for lists might contain a coder for
the elements of the list. Language-specific serialization techniques are
frequently used, but there are a few common key formats (such as key-value pairs
and timestamps) so the runner can understand them.&lt;/p>
&lt;p>&lt;strong>Windowing strategy&lt;/strong>:&lt;/p>
&lt;p>Every &lt;code>PCollection&lt;/code> has a windowing strategy, which is a specification of
essential information for grouping and triggering operations. The &lt;code>Window&lt;/code>
transform sets up the windowing strategy, and the &lt;code>GroupByKey&lt;/code> transform has
behavior that is governed by the windowing strategy.&lt;/p>
&lt;br/>
&lt;p>For more information about PCollections, see the following page:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pcollections">Beam Programming Guide: PCollections&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="ptransform">PTransform&lt;/h2>
&lt;p>A &lt;code>PTransform&lt;/code> (or transform) represents a data processing operation, or a step,
in your pipeline. A transform is usually applied to one or more input
&lt;code>PCollection&lt;/code> objects. Transforms that read input are an exception; these
transforms might not have an input &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>You provide transform processing logic in the form of a function object
(colloquially referred to as “user code”), and your user code is applied to each
element of the input PCollection (or more than one PCollection). Depending on
the pipeline runner and backend that you choose, many different workers across a
cluster might execute instances of your user code in parallel. The user code
that runs on each worker generates the output elements that are added to zero or
more output &lt;code>PCollection&lt;/code> objects.&lt;/p>
&lt;p>The Beam SDKs contain a number of different transforms that you can apply to
your pipeline’s PCollections. These include general-purpose core transforms,
such as &lt;code>ParDo&lt;/code> or &lt;code>Combine&lt;/code>. There are also pre-written composite transforms
included in the SDKs, which combine one or more of the core transforms in a
useful processing pattern, such as counting or combining elements in a
collection. You can also define your own more complex composite transforms to
fit your pipeline’s exact use case.&lt;/p>
&lt;p>The following list has some common transform types:&lt;/p>
&lt;ul>
&lt;li>Source transforms such as &lt;code>TextIO.Read&lt;/code> and &lt;code>Create&lt;/code>. A source transform
conceptually has no input.&lt;/li>
&lt;li>Processing and conversion operations such as &lt;code>ParDo&lt;/code>, &lt;code>GroupByKey&lt;/code>,
&lt;code>CoGroupByKey&lt;/code>, &lt;code>Combine&lt;/code>, and &lt;code>Count&lt;/code>.&lt;/li>
&lt;li>Outputting transforms such as &lt;code>TextIO.Write&lt;/code>.&lt;/li>
&lt;li>User-defined, application-specific composite transforms.&lt;/li>
&lt;/ul>
&lt;p>For more information about transforms, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#overview">Beam Programming Guide: Overview&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#transforms">Beam Programming Guide: Transforms&lt;/a>&lt;/li>
&lt;li>Beam transform catalog (&lt;a href="/documentation/transforms/java/overview/">Java&lt;/a>,
&lt;a href="/documentation/transforms/python/overview/">Python&lt;/a>)&lt;/li>
&lt;/ul>
&lt;h2 id="aggregation">Aggregation&lt;/h2>
&lt;p>Aggregation is computing a value from multiple (1 or more) input elements. In
Beam, the primary computational pattern for aggregation is to group all elements
with a common key and window then combine each group of elements using an
associative and commutative operation. This is similar to the &amp;ldquo;Reduce&amp;rdquo; operation
in the &lt;a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce&lt;/a> model, though it is
enhanced to work with unbounded input streams as well as bounded data sets.&lt;/p>
&lt;img src="/images/aggregation.png" alt="Aggregation of elements." width="120px">
&lt;p>&lt;em>Figure 1: Aggregation of elements. Elements with the same color represent those
with a common key and window.&lt;/em>&lt;/p>
&lt;p>Some simple aggregation transforms include &lt;code>Count&lt;/code> (computes the count of all
elements in the aggregation), &lt;code>Max&lt;/code> (computes the maximum element in the
aggregation), and &lt;code>Sum&lt;/code> (computes the sum of all elements in the aggregation).&lt;/p>
&lt;p>When elements are grouped and emitted as a bag, the aggregation is known as
&lt;code>GroupByKey&lt;/code> (the associative/commutative operation is bag union). In this case,
the output is no smaller than the input. Often, you will apply an operation such
as summation, called a &lt;code>CombineFn&lt;/code>, in which the output is significantly smaller
than the input. In this case the aggregation is called &lt;code>CombinePerKey&lt;/code>.&lt;/p>
&lt;p>In a real application, you might have millions of keys and/or windows; that is
why this is still an &amp;ldquo;embarrassingly parallel&amp;rdquo; computational pattern. In those
cases where you have fewer keys, you can add parallelism by adding a
supplementary key, splitting each of your problem&amp;rsquo;s natural keys into many
sub-keys. After these sub-keys are aggregated, the results can be further
combined into a result for the original natural key for your problem. The
associativity of your aggregation function ensures that this yields the same
answer, but with more parallelism.&lt;/p>
&lt;p>When your input is unbounded, the computational pattern of grouping elements by
key and window is roughly the same, but governing when and how to emit the
results of aggregation involves three concepts:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#window">Windowing&lt;/a>, which partitions your input into bounded subsets that
can be complete.&lt;/li>
&lt;li>&lt;a href="#watermark">Watermarks&lt;/a>, which estimate the completeness of your input.&lt;/li>
&lt;li>&lt;a href="#trigger">Triggers&lt;/a>, which govern when and how to emit aggregated results.&lt;/li>
&lt;/ul>
&lt;p>For more information about available aggregation transforms, see the following
pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#core-beam-transforms">Beam Programming Guide: Core Beam transforms&lt;/a>&lt;/li>
&lt;li>Beam Transform catalog
(&lt;a href="/documentation/transforms/java/overview/#aggregation">Java&lt;/a>,
&lt;a href="/documentation/transforms/python/overview/#aggregation">Python&lt;/a>)&lt;/li>
&lt;/ul>
&lt;h2 id="user-defined-function-udf">User-defined function (UDF)&lt;/h2>
&lt;p>Some Beam operations allow you to run user-defined code as a way to configure
the transform. For example, when using &lt;code>ParDo&lt;/code>, user-defined code specifies what
operation to apply to every element. For &lt;code>Combine&lt;/code>, it specifies how values
should be combined. By using &lt;a href="/documentation/patterns/cross-language/">cross-language transforms&lt;/a>,
a Beam pipeline can contain UDFs written in a different language, or even
multiple languages in the same pipeline.&lt;/p>
&lt;p>Beam has several varieties of UDFs:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pardo">&lt;em>DoFn&lt;/em>&lt;/a> - per-element processing
function (used in &lt;code>ParDo&lt;/code>)&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#setting-your-pcollections-windowing-function">&lt;em>WindowFn&lt;/em>&lt;/a> -
places elements in windows and merges windows (used in &lt;code>Window&lt;/code> and
&lt;code>GroupByKey&lt;/code>)&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#side-inputs">&lt;em>ViewFn&lt;/em>&lt;/a> - adapts a
materialized &lt;code>PCollection&lt;/code> to a particular interface (used in side inputs)&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#side-inputs-windowing">&lt;em>WindowMappingFn&lt;/em>&lt;/a> -
maps one element&amp;rsquo;s window to another, and specifies bounds on how far in the
past the result window will be (used in side inputs)&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#combine">&lt;em>CombineFn&lt;/em>&lt;/a> - associative and
commutative aggregation (used in &lt;code>Combine&lt;/code> and state)&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#data-encoding-and-type-safety">&lt;em>Coder&lt;/em>&lt;/a> -
encodes user data; some coders have standard formats and are not really UDFs&lt;/li>
&lt;/ul>
&lt;p>Each language SDK has its own idiomatic way of expressing the user-defined
functions in Beam, but there are common requirements. When you build user code
for a Beam transform, you should keep in mind the distributed nature of
execution. For example, there might be many copies of your function running on a
lot of different machines in parallel, and those copies function independently,
without communicating or sharing state with any of the other copies. Each copy
of your user code function might be retried or run multiple times, depending on
the pipeline runner and the processing backend that you choose for your
pipeline. Beam also supports stateful processing through the
&lt;a href="/blog/stateful-processing/">stateful processing API&lt;/a>.&lt;/p>
&lt;p>For more information about user-defined functions, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms">Requirements for writing user code for Beam transforms&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#pardo">Beam Programming Guide: ParDo&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#setting-your-pcollections-windowing-function">Beam Programming Guide: WindowFn&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#combine">Beam Programming Guide: CombineFn&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#data-encoding-and-type-safety">Beam Programming Guide: Coder&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#side-inputs">Beam Programming Guide: Side inputs&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="schema">Schema&lt;/h2>
&lt;p>A schema is a language-independent type definition for a &lt;code>PCollection&lt;/code>. The
schema for a &lt;code>PCollection&lt;/code> defines elements of that &lt;code>PCollection&lt;/code> as an ordered
list of named fields. Each field has a name, a type, and possibly a set of user
options.&lt;/p>
&lt;p>In many cases, the element type in a &lt;code>PCollection&lt;/code> has a structure that can be
introspected. Some examples are JSON, Protocol Buffer, Avro, and database row
objects. All of these formats can be converted to Beam Schemas. Even within a
SDK pipeline, Simple Java POJOs (or equivalent structures in other languages)
are often used as intermediate types, and these also have a clear structure that
can be inferred by inspecting the class. By understanding the structure of a
pipeline’s records, we can provide much more concise APIs for data processing.&lt;/p>
&lt;p>Beam provides a collection of transforms that operate natively on schemas. For
example, &lt;a href="/documentation/dsls/sql/overview/">Beam SQL&lt;/a> is a common transform
that operates on schemas. These transforms allow selections and aggregations in
terms of named schema fields. Another advantage of schemas is that they allow
referencing of element fields by name. Beam provides a selection syntax for
referencing fields, including nested and repeated fields.&lt;/p>
&lt;p>For more information about schemas, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#schemas">Beam Programming Guide: Schemas&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/patterns/schema/">Schema Patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="runner">Runner&lt;/h2>
&lt;p>A Beam runner runs a Beam pipeline on a specific platform. Most runners are
translators or adapters to massively parallel big data processing systems, such
as Apache Flink, Apache Spark, Google Cloud Dataflow, and more. For example, the
Flink runner translates a Beam pipeline into a Flink job. The Direct Runner runs
pipelines locally so you can test, debug, and validate that your pipeline
adheres to the Apache Beam model as closely as possible.&lt;/p>
&lt;p>For an up-to-date list of Beam runners and which features of the Apache Beam
model they support, see the runner
&lt;a href="/documentation/runners/capability-matrix/">capability matrix&lt;/a>.&lt;/p>
&lt;p>For more information about runners, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/#choosing-a-runner">Choosing a Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/capability-matrix/">Beam Capability Matrix&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="window">Window&lt;/h2>
&lt;p>Windowing subdivides a &lt;code>PCollection&lt;/code> into &lt;em>windows&lt;/em> according to the timestamps
of its individual elements. Windows enable grouping operations over unbounded
collections by dividing the collection into windows of finite collections.&lt;/p>
&lt;p>A &lt;em>windowing function&lt;/em> tells the runner how to assign elements to one or more
initial windows, and how to merge windows of grouped elements. Each element in a
&lt;code>PCollection&lt;/code> can only be in one window, so if a windowing function specifies
multiple windows for an element, the element is conceptually duplicated into
each of the windows and each element is identical except for its window.&lt;/p>
&lt;p>Transforms that aggregate multiple elements, such as &lt;code>GroupByKey&lt;/code> and &lt;code>Combine&lt;/code>,
work implicitly on a per-window basis; they process each &lt;code>PCollection&lt;/code> as a
succession of multiple, finite windows, though the entire collection itself may
be of unbounded size.&lt;/p>
&lt;p>Beam provides several windowing functions:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Fixed time windows&lt;/strong> (also known as &amp;ldquo;tumbling windows&amp;rdquo;) represent a consistent
duration, non-overlapping time interval in the data stream.&lt;/li>
&lt;li>&lt;strong>Sliding time windows&lt;/strong> (also known as &amp;ldquo;hopping windows&amp;rdquo;) also represent time
intervals in the data stream; however, sliding time windows can overlap.&lt;/li>
&lt;li>&lt;strong>Per-session windows&lt;/strong> define windows that contain elements that are within a
certain gap duration of another element.&lt;/li>
&lt;li>&lt;strong>Single global window&lt;/strong>: by default, all data in a &lt;code>PCollection&lt;/code> is assigned to
the single global window, and late data is discarded.&lt;/li>
&lt;li>&lt;strong>Calendar-based windows&lt;/strong> (not supported by the Beam SDK for Python)&lt;/li>
&lt;/ul>
&lt;p>You can also define your own windowing function if you have more complex
requirements.&lt;/p>
&lt;p>For example, let&amp;rsquo;s say we have a &lt;code>PCollection&lt;/code> that uses fixed-time windowing,
with windows that are five minutes long. For each window, Beam must collect all
the data with an event time timestamp in the given window range (between 0:00
and 4:59 in the first window, for instance). Data with timestamps outside that
range (data from 5:00 or later) belongs to a different window.&lt;/p>
&lt;p>Two concepts are closely related to windowing and covered in the following
sections: &lt;a href="#watermark">watermarks&lt;/a> and &lt;a href="#trigger">triggers&lt;/a>.&lt;/p>
&lt;p>For more information about windows, see the following page:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#windowing">Beam Programming Guide: Windowing&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#setting-your-pcollections-windowing-function">Beam Programming Guide: WindowFn&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="watermark">Watermark&lt;/h2>
&lt;p>In any data processing system, there is a certain amount of lag between the time
a data event occurs (the “event time”, determined by the timestamp on the data
element itself) and the time the actual data element gets processed at any stage
in your pipeline (the “processing time”, determined by the clock on the system
processing the element). In addition, data isn’t always guaranteed to arrive in
a pipeline in time order, or to always arrive at predictable intervals. For
example, you might have intermediate systems that don&amp;rsquo;t preserve order, or you
might have two servers that timestamp data but one has a better network
connection.&lt;/p>
&lt;p>To address this potential unpredictability, Beam tracks a &lt;em>watermark&lt;/em>. A
watermark is a guess as to when all data in a certain window is expected to have
arrived in the pipeline. You can also think of this as “we’ll never see an
element with an earlier timestamp”.&lt;/p>
&lt;p>Data sources are responsible for producing a watermark, and every &lt;code>PCollection&lt;/code>
must have a watermark that estimates how complete the &lt;code>PCollection&lt;/code> is. The
contents of a &lt;code>PCollection&lt;/code> are complete when a watermark advances to
“infinity”. In this manner, you might discover that an unbounded &lt;code>PCollection&lt;/code>
is finite. After the watermark progresses past the end of a window, any further
element that arrives with a timestamp in that window is considered &lt;em>late data&lt;/em>.&lt;/p>
&lt;p>&lt;a href="#trigger">Triggers&lt;/a> are a related concept that allow you to modify and refine
the windowing strategy for a &lt;code>PCollection&lt;/code>. You can use triggers to decide when
each individual window aggregates and reports its results, including how the
window emits late elements.&lt;/p>
&lt;p>For more information about watermarks, see the following page:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#watermarks-and-late-data">Beam Programming Guide: Watermarks and late data&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="trigger">Trigger&lt;/h2>
&lt;p>When collecting and grouping data into windows, Beam uses &lt;em>triggers&lt;/em> to
determine when to emit the aggregated results of each window (referred to as a
&lt;em>pane&lt;/em>). If you use Beam’s default windowing configuration and default trigger,
Beam outputs the aggregated result when it estimates all data has arrived, and
discards all subsequent data for that window.&lt;/p>
&lt;p>At a high level, triggers provide two additional capabilities compared to
outputting at the end of a window:&lt;/p>
&lt;ol>
&lt;li>Triggers allow Beam to emit early results, before all the data in a given
window has arrived. For example, emitting after a certain amount of time
elapses, or after a certain number of elements arrives.&lt;/li>
&lt;li>Triggers allow processing of late data by triggering after the event time
watermark passes the end of the window.&lt;/li>
&lt;/ol>
&lt;p>These capabilities allow you to control the flow of your data and also balance
between data completeness, latency, and cost.&lt;/p>
&lt;p>Beam provides a number of pre-built triggers that you can set:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Event time triggers&lt;/strong>: These triggers operate on the event time, as
indicated by the timestamp on each data element. Beam’s default trigger is
event time-based.&lt;/li>
&lt;li>&lt;strong>Processing time triggers&lt;/strong>: These triggers operate on the processing time,
which is the time when the data element is processed at any given stage in
the pipeline.&lt;/li>
&lt;li>&lt;strong>Data-driven triggers&lt;/strong>: These triggers operate by examining the data as it
arrives in each window, and firing when that data meets a certain property.
Currently, data-driven triggers only support firing after a certain number of
data elements.&lt;/li>
&lt;li>&lt;strong>Composite triggers&lt;/strong>: These triggers combine multiple triggers in various
ways. For example, you might want one trigger for early data and a different
trigger for late data.&lt;/li>
&lt;/ul>
&lt;p>For more information about triggers, see the following page:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#triggers">Beam Programming Guide: Triggers&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="state-and-timers">State and timers&lt;/h2>
&lt;p>Beam’s windowing and triggers provide an abstraction for grouping and
aggregating unbounded input data based on timestamps. However, there are
aggregation use cases that might require an even higher degree of control. State
and timers are two important concepts that help with these uses cases. Like
other aggregations, state and timers are processed per window.&lt;/p>
&lt;p>&lt;strong>State&lt;/strong>:&lt;/p>
&lt;p>Beam provides the State API for manually managing per-key state, allowing for
fine-grained control over aggregations. The State API lets you augment
element-wise operations (for example, &lt;code>ParDo&lt;/code> or &lt;code>Map&lt;/code>) with mutable state. Like
other aggregations, state is processed per window.&lt;/p>
&lt;p>The State API models state per key. To use the state API, you start out with a
keyed &lt;code>PCollection&lt;/code>. A &lt;code>ParDo&lt;/code> that processes this &lt;code>PCollection&lt;/code> can declare
persistent state variables. When you process each element inside the &lt;code>ParDo&lt;/code>,
you can use the state variables to write or update state for the current key or
to read previous state written for that key. State is always fully scoped only
to the current processing key.&lt;/p>
&lt;p>Beam provides several types of state, though different runners might support a
different subset of these states.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>ValueState&lt;/strong>: ValueState is a scalar state value. For each key in the
input, a ValueState stores a typed value that can be read and modified inside
the &lt;code>DoFn&lt;/code>.&lt;/li>
&lt;li>A common use case for state is to accumulate multiple elements into a group:
&lt;ul>
&lt;li>&lt;strong>BagState&lt;/strong>: BagState allows you to accumulate elements in an unordered
bag. This lets you add elements to a collection without needing to read any
of the previously accumulated elements.&lt;/li>
&lt;li>&lt;strong>MapState&lt;/strong>: MapState allows you to accumulate elements in a map.&lt;/li>
&lt;li>&lt;strong>SetState&lt;/strong>: SetState allows you to accumulate elements in a set.&lt;/li>
&lt;li>&lt;strong>OrderedListState&lt;/strong>: OrderedListState allows you to accumulate elements in
a timestamp-sorted list.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>CombiningState&lt;/strong>: CombiningState allows you to create a state object that
is updated using a Beam combiner. Like BagState, you can add elements to an
aggregation without needing to read the current value, and the accumulator
can be compacted using a combiner.&lt;/li>
&lt;/ul>
&lt;p>You can use the State API together with the Timer API to create processing tasks
that give you fine-grained control over the workflow.&lt;/p>
&lt;p>&lt;strong>Timers&lt;/strong>:&lt;/p>
&lt;p>Beam provides a per-key timer callback API that enables delayed processing of
data stored using the State API. The Timer API lets you set timers to call back
at either an event-time or a processing-time timestamp. For more advanced use
cases, your timer callback can set another timer. Like other aggregations,
timers are processed per window. You can use the timer API together with the
State API to create processing tasks that give you fine-grained control over the
workflow.&lt;/p>
&lt;p>The following timers are available:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Event-time timers&lt;/strong>: Event-time timers fire when the input watermark for
the &lt;code>DoFn&lt;/code> passes the time at which the timer is set, meaning that the runner
believes that there are no more elements to be processed with timestamps
before the timer timestamp. This allows for event-time aggregations.&lt;/li>
&lt;li>&lt;strong>Processing-time timers&lt;/strong>: Processing-time timers fire when the real wall-clock
time passes. This is often used to create larger batches of data before
processing. It can also be used to schedule events that should occur at a
specific time.&lt;/li>
&lt;li>&lt;strong>Dynamic timer tags&lt;/strong>: Beam also supports dynamically setting a timer tag. This
allows you to set multiple different timers in a &lt;code>DoFn&lt;/code> and dynamically
choose timer tags (for example, based on data in the input elements).&lt;/li>
&lt;/ul>
&lt;p>For more information about state and timers, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#state-and-timers">Beam Programming Guide: State and Timers&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/stateful-processing/">Stateful processing with Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/timely-processing/">Timely (and Stateful) Processing with Apache Beam&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="splittable-dofn">Splittable DoFn&lt;/h2>
&lt;p>Splittable &lt;code>DoFn&lt;/code> (SDF) is a generalization of &lt;code>DoFn&lt;/code> that lets you process
elements in a non-monolithic way. Splittable &lt;code>DoFn&lt;/code> makes it easier to create
complex, modular I/O connectors in Beam.&lt;/p>
&lt;p>A regular &lt;code>ParDo&lt;/code> processes an entire element at a time, applying your regular
&lt;code>DoFn&lt;/code> and waiting for the call to terminate. When you instead apply a
splittable &lt;code>DoFn&lt;/code> to each element, the runner has the option of splitting the
element&amp;rsquo;s processing into smaller tasks. You can checkpoint the processing of an
element, and you can split the remaining work to yield additional parallelism.&lt;/p>
&lt;p>For example, imagine you want to read every line from very large text files.
When you write your splittable &lt;code>DoFn&lt;/code>, you can have separate pieces of logic to
read a segment of a file, split a segment of a file into sub-segments, and
report progress through the current segment. The runner can then invoke your
splittable &lt;code>DoFn&lt;/code> intelligently to split up each input and read portions
separately, in parallel.&lt;/p>
&lt;p>A common computation pattern has the following steps:&lt;/p>
&lt;ol>
&lt;li>The runner splits an incoming element before starting any processing.&lt;/li>
&lt;li>The runner starts running your processing logic on each sub-element.&lt;/li>
&lt;li>If the runner notices that some sub-elements are taking longer than others,
the runner splits those sub-elements further and repeats step 2.&lt;/li>
&lt;li>The sub-element either finishes processing, or the user chooses to
checkpoint the sub-element and the runner repeats step 2.&lt;/li>
&lt;/ol>
&lt;p>You can also write your splittable &lt;code>DoFn&lt;/code> so the runner can split the unbounded
processing. For example, if you write a splittable &lt;code>DoFn&lt;/code> to watch a set of
directories and output filenames as they arrive, you can split to subdivide the
work of different directories. This allows the runner to split off a hot
directory and give it additional resources.&lt;/p>
&lt;p>For more information about Splittable &lt;code>DoFn&lt;/code>, see the following pages:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#splittable-dofns">Splittable DoFns&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/splittable-do-fn-is-available/">Splittable DoFn in Apache Beam is Ready to Use&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="whats-next">What&amp;rsquo;s next&lt;/h2>
&lt;p>Take a look at our &lt;a href="/documentation/">other documentation&lt;/a> such as the Beam
programming guide, pipeline execution information, and transform reference
catalogs.&lt;/p></description></item><item><title>Documentation: BatchElements</title><link>/documentation/transforms/python/aggregation/batchelements/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/batchelements/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="batchelements">BatchElements&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.stat.html#apache_beam.transforms.stat.BatchElements"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_BatchElements"
data-show="batchelements"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_BatchElements%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22batchelements%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/groupintobatches">GroupIntoBatches&lt;/a> batches elements by key&lt;/li>
&lt;/ul></description></item><item><title>Documentation: BatchElements</title><link>/documentation/transforms/python/aggregation/tolist/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/tolist/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="batchelements">BatchElements&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.combiners.html#apache_beam.transforms.combiners.ToList"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_ToList"
data-show="tolist"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_ToList%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22tolist%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div></description></item><item><title>Documentation: Beam glossary</title><link>/documentation/glossary/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/glossary/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-glossary">Apache Beam glossary&lt;/h1>
&lt;h2 id="aggregation">Aggregation&lt;/h2>
&lt;p>A transform pattern for computing a value from multiple input elements. Aggregation is similar to the reduce operation in the &lt;a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce&lt;/a> model. Aggregation transforms include Combine (applies a user-defined function to all elements in the aggregation), Count (computes the count of all elements in the aggregation), Max (computes the maximum element in the aggregation), and Sum (computes the sum of all elements in the aggregation).&lt;/p>
&lt;p>For a list of built-in aggregation transforms, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/overview/#aggregation">Java Transform catalog&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/overview/#aggregation">Python Transform catalog&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#aggregation">Basics of the Beam model: Aggregation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="apply">Apply&lt;/h2>
&lt;p>A method for invoking a transform on an input PCollection (or set of PCollections) to produce one or more output PCollections. The &lt;code>apply&lt;/code> method is attached to the PCollection (or value). Invoking multiple Beam transforms is similar to method chaining, but with a difference: You apply the transform to the input PCollection, passing the transform itself as an argument, and the operation returns the output PCollection. Because of Beam’s deferred execution model, applying a transform does not immediately execute that transform.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#applying-transforms">Applying transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="batch-processing">Batch processing&lt;/h2>
&lt;p>A data processing paradigm for working with finite, or bounded, datasets. A bounded PCollection represents a dataset of a known, fixed size. Reading from a batch data source, such as a file or a database, creates a bounded PCollection. A batch processing job eventually ends, in contrast to a streaming job, which runs until cancelled.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#size-and-boundedness">Size and boundedness&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="bounded-data">Bounded data&lt;/h2>
&lt;p>A dataset of a known, fixed size (alternatively, a dataset that is not growing over time). A PCollection can be bounded or unbounded, depending on the source of the data that it represents. Reading from a batch data source, such as a file or a database, creates a bounded PCollection. Beam also supports reading a bounded amount of data from an unbounded source.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#size-and-boundedness">Size and boundedness&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="bundle">Bundle&lt;/h2>
&lt;p>The processing and commit/retry unit for elements in a PCollection. Instead of processing all elements in a PCollection simultaneously, Beam processes the elements in bundles. The runner handles the division of the collection into bundles, and in doing so it may optimize the bundle size for the use case. For example, a streaming runner might process smaller bundles than a batch runner.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runtime/model/#bundling-and-persistence">Bundling and persistence&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="coder">Coder&lt;/h2>
&lt;p>A component that describes how the elements of a PCollection can be encoded and decoded. To support distributed processing and cross-language portability, Beam needs to be able to encode each element of a PCollection as bytes. The Beam SDKs provide built-in coders for common types and language-specific mechanisms for specifying the encoding of a PCollection.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#data-encoding-and-type-safety">Data encoding and type safety&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="cogroupbykey">CoGroupByKey&lt;/h2>
&lt;p>A PTransform that takes two or more PCollections and aggregates the elements by key. In effect, CoGroupByKey performs a relational join of two or more key/value PCollections that have the same key type. While GroupByKey performs this operation over a single input collection, CoGroupByKey operates over multiple input collections.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#cogroupbykey">CoGroupByKey&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/cogroupbykey/">CoGroupByKey (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/cogroupbykey/">CoGroupByKey (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="collection">Collection&lt;/h2>
&lt;p>See &lt;a href="/documentation/glossary/#pcollection">PCollection&lt;/a>.&lt;/p>
&lt;h2 id="combine">Combine&lt;/h2>
&lt;p>A PTransform for combining all elements of a PCollection or all values associated with a key. When you apply a Combine transform, you have to provide a user-defined function (UDF) that contains the logic for combining the elements or values. The combining function should be &lt;a href="https://en.wikipedia.org/wiki/Commutative_property">commutative&lt;/a> and &lt;a href="https://en.wikipedia.org/wiki/Associative_property">associative&lt;/a>, because the function is not necessarily invoked exactly once on all values with a given key.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#combine">Combine&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/combine/">Combine (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineglobally/">CombineGlobally (Python)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineperkey/">CombinePerKey (Python)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combinevalues/">CombineValues (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="composite-transform">Composite transform&lt;/h2>
&lt;p>A PTransform that expands into many PTransforms. Composite transforms have a nested structure, in which a complex transform applies one or more simpler transforms. These simpler transforms could be existing Beam operations like ParDo, Combine, or GroupByKey, or they could be other composite transforms. Nesting multiple transforms inside a single composite transform can make your pipeline more modular and easier to understand. Many of the built-in transforms are composite transforms.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#composite-transforms">Composite transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="counter-metric">Counter (metric)&lt;/h2>
&lt;p>A metric that reports a single long value and can be incremented. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#types-of-metrics">Types of metrics&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="cross-language-transforms">Cross-language transforms&lt;/h2>
&lt;p>Transforms that can be shared across Beam SDKs. With cross-language transforms, you can use transforms written in any supported SDK language (currently, Java and Python) in a pipeline written in a different SDK language. For example, you could use the Apache Kafka connector from the Java SDK in a Python streaming pipeline. Cross-language transforms make it possible to provide new functionality simultaneously in different SDKs.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#multi-language-pipelines">Multi-language pipelines&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="deferred-execution">Deferred execution&lt;/h2>
&lt;p>A feature of the Beam execution model. Beam operations are deferred, meaning that the result of a given operation may not be available for control flow. Deferred execution allows the Beam API to support parallel processing of data and perform pipeline-level optimizations.&lt;/p>
&lt;h2 id="distribution-metric">Distribution (metric)&lt;/h2>
&lt;p>A metric that reports information about the distribution of reported values. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#types-of-metrics">Types of metrics&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="dofn">DoFn&lt;/h2>
&lt;p>A function object used by ParDo (or some other transform) to process the elements of a PCollection, often producing elements for an output PCollection. A DoFn is a user-defined function, meaning that it contains custom code that defines a data processing task in your pipeline. The Beam system invokes a DoFn one or more times to process some arbitrary bundle of elements, but Beam doesn’t guarantee an exact number of invocations.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="driver">Driver&lt;/h2>
&lt;p>A program that defines your pipeline, including all of the inputs, transforms, and outputs. To use Beam, you need to create a driver program using classes from one of the Beam SDKs. The driver program creates a pipeline and specifies the execution options that tell the pipeline where and how to run. These options include the runner, which determines what backend your pipeline will run on.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#overview">Overview&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="element">Element&lt;/h2>
&lt;p>The unit of data in a PCollection. Elements in a PCollection can be of any type, but they must all have the same type. This allows parallel computations to operate uniformly across the entire collection. Some element types have a structure that can be introspected (for example, JSON, Protocol Buffer, Avro, and database records).&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pcollection-characteristics">PCollection characteristics&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="element-wise">Element-wise&lt;/h2>
&lt;p>A type of transform that independently processes each element in an input PCollection. Element-wise is similar to the map operation in the &lt;a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce&lt;/a> model. An element-wise transform might output 0, 1, or multiple values for each input element. This is in contrast to aggregation transforms, which compute a single value from multiple input elements. Element-wise operations include Filter, FlatMap, and ParDo.&lt;/p>
&lt;p>For a complete list of element-wise transforms, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/overview/#element-wise">Java Transform catalog&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/overview/#element-wise">Python Transform catalog&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="engine">Engine&lt;/h2>
&lt;p>A data-processing system, such as Dataflow, Spark, or Flink. A Beam runner for an engine executes a Beam pipeline on that engine.&lt;/p>
&lt;h2 id="event-time">Event time&lt;/h2>
&lt;p>The time a data event occurs, determined by a timestamp on an element. This is in contrast to processing time, which is when an element is processed in a pipeline. An event could be, for example, a user interaction or a write to an error log. There’s no guarantee that events will appear in a pipeline in order of event time, but windowing and timers let you reason correctly about event time.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#watermarks-and-late-data">Watermarks and late data&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#triggers">Triggers&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="expansion-service">Expansion Service&lt;/h2>
&lt;p>A service that enables a pipeline to apply (expand) cross-language transforms defined in other SDKs. For example, by connecting to a Java expansion service, the Python SDK can apply transforms implemented in Java. Currently, SDKs typically start up expansion services as local processes, but in the future Beam may support long-running expansion services. The development of expansion services is part of the ongoing effort to support multi-language pipelines.&lt;/p>
&lt;h2 id="flatten">Flatten&lt;/h2>
&lt;p>One of the core PTransforms. Flatten merges multiple PCollections into a single logical PCollection.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#flatten">Flatten&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/other/flatten/">Flatten (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/other/flatten/">Flatten (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="fn-api">Fn API&lt;/h2>
&lt;p>An interface that lets a runner invoke SDK-specific user-defined functions. The Fn API, together with the Runner API, supports the ability to mix and match SDKs and runners. Used together, the Fn and Runner APIs let new SDKs run on every runner, and let new runners run pipelines from every SDK.&lt;/p>
&lt;h2 id="fusion">Fusion&lt;/h2>
&lt;p>An optimization that Beam runners can apply before running a pipeline. When one transform outputs a PCollection that’s consumed by another transform, or when two or more transforms take the same PCollection as input, a runner may be able to fuse the transforms together into a single processing unit (a &lt;em>stage&lt;/em> in Dataflow). The consuming DoFn processes elements as they are emitted by the producing DoFn, rather than waiting for the entire intermediate PCollection to be computed. Fusion can make pipeline execution more efficient by preventing I/O operations.&lt;/p>
&lt;h2 id="gauge-metric">Gauge (metric)&lt;/h2>
&lt;p>A metric that reports the latest value out of reported values. In the Beam model, metrics provide insight into the state of a pipeline, potentially while the pipeline is running. Because metrics are collected from many workers, the gauge value may not be the absolute last value, but it will be one of the latest values produced by one of the workers.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#types-of-metrics">Types of metrics&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="groupbykey">GroupByKey&lt;/h2>
&lt;p>A PTransform for processing collections of key/value pairs. GroupByKey is a parallel reduction operation, similar to the shuffle of a map/shuffle/reduce algorithm. The input to GroupByKey is a collection of key/value pairs in which multiple pairs have the same key but different values (i.e. a multimap). You can use GroupByKey to collect all of the values associated with each unique key.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#groupbykey">GroupByKey&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/groupbykey/">GroupByKey (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/groupbykey/">GroupByKey (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="io-connector">I/O connector&lt;/h2>
&lt;p>A set of PTransforms for working with external data storage systems. When you create a pipeline, you often need to read from or write to external data systems such as files or databases. Beam provides read and write transforms for a number of common data storage types.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pipeline-io">Pipeline I/O&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/io/built-in/">Built-in I/O Transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="map">Map&lt;/h2>
&lt;p>An element-wise PTransform that applies a user-defined function (UDF) to each element in a PCollection. Using Map, you can transform each individual element into a new element, but you can&amp;rsquo;t change the number of elements.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/elementwise/map/">Map (Python)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/elementwise/mapelements/">MapElements (Java)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="metrics">Metrics&lt;/h2>
&lt;p>Data on the state of a pipeline, potentially while the pipeline is running. You can use the built-in Beam metrics to gain insight into the functioning of your pipeline. For example, you might use Beam metrics to track errors, calls to a backend service, or the number of elements processed. Beam currently supports three types of metric: Counter, Distribution, and Gauge.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#metrics">Metrics&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="multi-language-pipeline">Multi-language pipeline&lt;/h2>
&lt;p>A pipeline that uses cross-language transforms. You can combine transforms written in any supported SDK language (currently, Java and Python) and use them in one multi-language pipeline.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#multi-language-pipelines">Multi-language pipelines&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="pardo">ParDo&lt;/h2>
&lt;p>The lowest-level element-wise PTransform. For each element in an input PCollection, ParDo applies a function and emits zero, one, or multiple elements to an output PCollection. “ParDo” is short for “Parallel Do.” It’s similar to the map operation in a &lt;a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce&lt;/a> algorithm and the reduce operation when following a GroupByKey. ParDo is also comparable to the &lt;code>apply&lt;/code> method from a DataFrame, or the &lt;code>UPDATE&lt;/code> keyword from SQL.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/elementwise/pardo/">ParDo (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/elementwise/pardo/">ParDo (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="partition">Partition&lt;/h2>
&lt;p>An element-wise PTransform that splits a single PCollection into a fixed number of smaller, disjoint PCollections. Partition requires a user-defined function (UDF) to determine how to split up the elements of the input collection into the resulting output collections. The number of partitions must be determined at graph construction time, meaning that you can’t determine the number of partitions using data calculated by the running pipeline.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#partition">Partition&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/elementwise/partition/">Partition (Java)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/elementwise/partition/">Partition (Python)&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="pcollection">PCollection&lt;/h2>
&lt;p>A potentially distributed, homogeneous dataset or data stream. PCollections represent data in a Beam pipeline, and Beam transforms (PTransforms) use PCollection objects as inputs and outputs. PCollections are intended to be immutable, meaning that once a PCollection is created, you can’t add, remove, or change individual elements. The “P” stands for “parallel.”&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#pcollection">Basics of the Beam model: PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#pcollections">Programming guide: PCollections&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="pipe-operator-">Pipe operator (&lt;code>|&lt;/code>)&lt;/h2>
&lt;p>Delimits a step in a Python pipeline. For example: &lt;code>[Final Output PCollection] = ([Initial Input PCollection] | [First Transform] | [Second Transform] | [Third Transform])&lt;/code>. The output of each transform is passed from left to right as input to the next transform. The pipe operator in Python is equivalent to the &lt;code>apply&lt;/code> method in Java (in other words, the pipe applies a transform to a PCollection), and usage is similar to the pipe operator in shell scripts, which lets you pass the output of one program into the input of another.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#applying-transforms">Applying transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="pipeline">Pipeline&lt;/h2>
&lt;p>An encapsulation of your entire data processing task, including reading input data from a source, transforming that data, and writing output data to a sink. You can think of a pipeline as a Beam program that uses PTransforms to process PCollections. (Alternatively, you can think of it as a single, executable composite PTransform with no inputs or outputs.) The transforms in a pipeline can be represented as a directed acyclic graph (DAG). All Beam driver programs must create a pipeline.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#pipeline">Basics of the Beam model: Pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#overview">Overview&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#creating-a-pipeline">Creating a pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/design-your-pipeline/">Design your pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/create-your-pipeline/">Create your pipeline&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="processing-time">Processing time&lt;/h2>
&lt;p>The real-world time at which an element is processed at some stage in a pipeline. Processing time is not the same as event time, which is the time at which a data event occurs. Processing time is determined by the clock on the system processing the element. There’s no guarantee that elements will be processed in order of event time.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#watermarks-and-late-data">Watermarks and late data&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#triggers">Triggers&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="ptransform">PTransform&lt;/h2>
&lt;p>A data processing operation, or a step, in your pipeline. A PTransform takes zero or more PCollections as input, applies a processing function to the elements of that PCollection, and produces zero or more output PCollections. Some PTransforms accept user-defined functions that apply custom logic. The “P” stands for “parallel.”&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#ptransform">Basics of the Beam model: PTransform&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#overview">Overview&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#transforms">Transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="resource-hints">Resource hints&lt;/h2>
&lt;p>A Beam feature that lets you provide information to a runner about the compute resource requirements of your pipeline. You can use resource hints to define requirements for specific transforms or for an entire pipeline. For example, you could use a resource hint to specify the minimum amount of memory to allocate to workers. The runner is responsible for interpreting resource hints, and runners can ignore unsupported hints.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runtime/resource-hints">Resource hints&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="runner">Runner&lt;/h2>
&lt;p>A runner runs a pipeline on a specific platform. Most runners are translators or adapters to massively parallel big data processing systems. Other runners exist for local testing and debugging. Among the supported runners are Google Cloud Dataflow, Apache Spark, Apache Samza, Apache Flink, the Interactive Runner, and the Direct Runner.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#runner">Basics of the Beam model: Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/#choosing-a-runner">Choosing a Runner&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/runners/capability-matrix/">Beam Capability Matrix&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="schema">Schema&lt;/h2>
&lt;p>A language-independent type definition for the elements of a PCollection. The schema for a PCollection defines elements of that PCollection as an ordered list of named fields. Each field has a name, a type, and possibly a set of user options. Schemas provide a way to reason about types across different programming-language APIs. They also let you describe data transformations more succinctly and at a higher level.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#schema">Basics of the Beam model: Schema&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#schemas">Programming guide: Schemas&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/patterns/schema/">Schema Patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="session">Session&lt;/h2>
&lt;p>A time interval for grouping data events. A session is defined by some minimum gap duration between events. For example, a data stream representing user mouse activity may have periods with high concentrations of clicks followed by periods of inactivity. A session can represent such a pattern of activity delimited by inactivity.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#session-windows">Session windows&lt;/a>&lt;/li>
&lt;li>&lt;a href="/get-started/mobile-gaming-example/#analyzing-usage-patterns">Analyzing Usage Patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="side-input">Side input&lt;/h2>
&lt;p>Additional input to a PTransform that is provided in its entirety, rather than element-by-element. Side input is input that you provide in addition to the main input PCollection. A DoFn can access side input each time it processes an element in the PCollection.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#side-inputs">Side inputs&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/patterns/side-inputs/">Side input patterns&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="sink">Sink&lt;/h2>
&lt;p>A transform that writes to an external data storage system, like a file or database.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/io/developing-io-overview/">Developing new I/O connectors&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#pipeline-io">Pipeline I/O&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/io/built-in/">Built-in I/O transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="source">Source&lt;/h2>
&lt;p>A transform that reads from an external storage system. A pipeline typically reads input data from a source. The source has a type, which may be different from the sink type, so you can change the format of data as it moves through your pipeline.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/io/developing-io-overview/">Developing new I/O connectors&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#pipeline-io">Pipeline I/O&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/io/built-in/">Built-in I/O transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="splittable-dofn">Splittable DoFn&lt;/h2>
&lt;p>A generalization of DoFn that makes it easier to create complex, modular I/O connectors. A Splittable DoFn (SDF) can process elements in a non-monolithic way, meaning that the processing can be decomposed into smaller tasks. With SDF, you can check-point the processing of an element, and you can split the remaining work to yield additional parallelism. SDF is recommended for building new I/O connectors.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#splittable-dofn">Basics of the Beam model: Splittable DoFn&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#splittable-dofns">Programming guide: Splittable DoFns&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/splittable-do-fn-is-available/">Splittable DoFn in Apache Beam is Ready to Use&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="stage">Stage&lt;/h2>
&lt;p>The unit of fused transforms in a pipeline. Runners can perform fusion optimization to make pipeline execution more efficient. In Dataflow, the pipeline is conceptualized as a graph of fused stages.&lt;/p>
&lt;h2 id="state">State&lt;/h2>
&lt;p>Persistent values that a PTransform can access. The state API lets you augment element-wise operations (for example, ParDo or Map) with mutable state. Using the state API, you can read from, and write to, state as you process each element of a PCollection. You can use the state API together with the timer API to create processing tasks that give you fine-grained control over the workflow. State is always local to a key and window.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#state-and-timers">Basics of the Beam model: State and timers&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#state-and-timers">Programming guide: State and Timers&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/stateful-processing/">Stateful processing with Apache Beam&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="streaming">Streaming&lt;/h2>
&lt;p>A data processing paradigm for working with infinite, or unbounded, datasets. Reading from a streaming data source, such as Pub/Sub or Kafka, creates an unbounded PCollection. An unbounded PCollection must be processed using a job that runs continuously, because the entire collection can never be available for processing at any one time.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#size-and-boundedness">Size and boundedness&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/sdks/python-streaming/">Python Streaming Pipelines&lt;/a>&lt;/li>
&lt;/ul>
&lt;h1 id="timer">Timer&lt;/h1>
&lt;p>A Beam feature that enables delayed processing of data stored using the state API. The timer API lets you set timers to call back at either an event-time or a processing-time timestamp. You can use the timer API together with the state API to create processing tasks that give you fine-grained control over the workflow.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#state-and-timers">Basics of the Beam model: State and timers&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#state-and-timers">Programming guide: State and Timers&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/stateful-processing/">Stateful processing with Apache Beam&lt;/a>&lt;/li>
&lt;li>&lt;a href="/blog/timely-processing/">Timely (and Stateful) Processing with Apache Beam&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="timestamp">Timestamp&lt;/h2>
&lt;p>A point in event time associated with an element in a PCollection and used to assign a window to the element. The source that creates the PCollection assigns each element an initial timestamp, often corresponding to when the element was read or added. But you can also manually assign timestamps. This can be useful if elements have an inherent timestamp, but the timestamp is somewhere in the structure of the element itself (for example, a time field in a server log entry).&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#timestamp">Basics of the Beam model: Timestamp&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#element-timestamps">Element timestamps&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#adding-timestamps-to-a-pcollections-elements">Adding timestamps to a PCollection’s elements&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="transform">Transform&lt;/h2>
&lt;p>See PTransform.&lt;/p>
&lt;h2 id="trigger">Trigger&lt;/h2>
&lt;p>Determines when to emit aggregated result data from a window. You can use triggers to refine the windowing strategy for your pipeline. If you use the default windowing configuration and default trigger, Beam outputs an aggregated result when it estimates that all data for a window has arrived, and it discards all subsequent data for that window. But you can also use triggers to emit early results, before all the data in a given window has arrived, or to process late data by triggering after the event time watermark passes the end of the window.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#trigger">Basics of the Beam model: Trigger&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#triggers">Programming guide: Triggers&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="unbounded-data">Unbounded data&lt;/h2>
&lt;p>A dataset that grows over time, with elements processed as they arrive. A PCollection can be bounded or unbounded, depending on the source of the data that it represents. Reading from a streaming or continuously-updating data source, such as Pub/Sub or Kafka, typically creates an unbounded PCollection.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide/#size-and-boundedness">Size and boundedness&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="user-defined-function">User-defined function&lt;/h2>
&lt;p>Custom logic that a PTransform applies to your data. Some PTransforms accept a user-defined function (UDF) as a way to configure the transform. For example, ParDo expects user code in the form of a DoFn object. Each language SDK has its own idiomatic way of expressing user-defined functions, but there are some common requirements, like serializability and thread compatibility.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#user-defined-functions-udfs">Basics of the Beam model: User-Defined Functions (UDFs)&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms">Requirements for writing user code for Beam transforms&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="watermark">Watermark&lt;/h2>
&lt;p>An estimate on the lower bound of the timestamps that will be seen (in the future) at this point of the pipeline. Watermarks provide a way to estimate the completeness of input data. Every PCollection has an associated watermark. Once the watermark progresses past the end of a window, any element that arrives with a timestamp in that window is considered late data.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#watermark">Basics of the Beam model: Watermark&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#watermarks-and-late-data">Programming guide: Watermarks and late data&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="windowing">Windowing&lt;/h2>
&lt;p>Partitioning a PCollection into bounded subsets grouped by the timestamps of individual elements. In the Beam model, any PCollection – including unbounded PCollections – can be subdivided into logical windows. Each element in a PCollection is assigned to one or more windows according to the PCollection&amp;rsquo;s windowing function, and each individual window contains a finite number of elements. Transforms that aggregate multiple elements, such as GroupByKey and Combine, work implicitly on a per-window basis.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/basics/#window">Basics of the Beam model: Window&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/programming-guide/#windowing">Programming guide: Windowing&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="worker">Worker&lt;/h2>
&lt;p>A container, process, or virtual machine (VM) that handles some part of the parallel processing of a pipeline. Each worker node has its own independent copy of state. A Beam runner might serialize elements between machines for communication purposes and for other reasons such as persistence.&lt;/p>
&lt;p>To learn more, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/runtime/model/">Execution model&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Beam Programming Guide</title><link>/documentation/programming-guide/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/programming-guide/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="apache-beam-programming-guide">Apache Beam Programming Guide&lt;/h1>
&lt;p>The &lt;strong>Beam Programming Guide&lt;/strong> is intended for Beam users who want to use the
Beam SDKs to create data processing pipelines. It provides guidance for using
the Beam SDK classes to build and test your pipeline. The programming guide is
not intended as an exhaustive reference, but as a language-agnostic, high-level
guide to programmatically building your Beam pipeline. As the programming guide
is filled out, the text will include code samples in multiple languages to help
illustrate how to implement Beam concepts in your pipelines.&lt;/p>
&lt;p>If you want a brief introduction to Beam&amp;rsquo;s basic concepts before reading the
programming guide, take a look at the
&lt;a href="/documentation/basics/">Basics of the Beam model&lt;/a> page.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;li data-value="typescript">TypeScript SDK&lt;/li>
&lt;li data-value="yaml">Yaml API&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p class="language-py">The Python SDK supports Python 3.8, 3.9, 3.10, and 3.11.&lt;/p>
&lt;p class="language-go">The &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go SDK&lt;/a> supports Go v1.20+.&lt;/p>
&lt;p class="language-typescript">The Typescript SDK supports Node v16+ and is still experimental.&lt;/p>
&lt;p class="language-yaml">YAML is supported as of Beam 2.52, but is under active development and the most
recent SDK is advised.&lt;/p>
&lt;h2 id="overview">1. Overview&lt;/h2>
&lt;p>To use Beam, you need to first create a driver program using the classes in one
of the Beam SDKs. Your driver program &lt;em>defines&lt;/em> your pipeline, including all of
the inputs, transforms, and outputs; it also sets execution options for your
pipeline (typically passed in using command-line options). These include the
Pipeline Runner, which, in turn, determines what back-end your pipeline will run
on.&lt;/p>
&lt;p>The Beam SDKs provide a number of abstractions that simplify the mechanics of
large-scale distributed data processing. The same Beam abstractions work with
both batch and streaming data sources. When you create your Beam pipeline, you
can think about your data processing task in terms of these abstractions. They
include:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>Pipeline&lt;/code>: A &lt;code>Pipeline&lt;/code> encapsulates your entire data processing task, from
start to finish. This includes reading input data, transforming that data, and
writing output data. All Beam driver programs must create a &lt;code>Pipeline&lt;/code>. When
you create the &lt;code>Pipeline&lt;/code>, you must also specify the execution options that
tell the &lt;code>Pipeline&lt;/code> where and how to run.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>PCollection&lt;/code>: A &lt;code>PCollection&lt;/code> represents a distributed data set that your
Beam pipeline operates on. The data set can be &lt;em>bounded&lt;/em>, meaning it comes
from a fixed source like a file, or &lt;em>unbounded&lt;/em>, meaning it comes from a
continuously updating source via a subscription or other mechanism. Your
pipeline typically creates an initial &lt;code>PCollection&lt;/code> by reading data from an
external data source, but you can also create a &lt;code>PCollection&lt;/code> from in-memory
data within your driver program. From there, &lt;code>PCollection&lt;/code>s are the inputs and
outputs for each step in your pipeline.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>PTransform&lt;/code>: A &lt;code>PTransform&lt;/code> represents a data processing operation, or a step,
in your pipeline. Every &lt;code>PTransform&lt;/code> takes one or more &lt;code>PCollection&lt;/code> objects as
input, performs a processing function that you provide on the elements of that
&lt;code>PCollection&lt;/code>, and produces zero or more output &lt;code>PCollection&lt;/code> objects.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;span class="language-go">
&lt;ul>
&lt;li>&lt;code>Scope&lt;/code>: The Go SDK has an explicit scope variable used to build a &lt;code>Pipeline&lt;/code>.
A &lt;code>Pipeline&lt;/code> can return it&amp;rsquo;s root scope with the &lt;code>Root()&lt;/code> method. The scope
variable is passed to &lt;code>PTransform&lt;/code> functions to place them in the &lt;code>Pipeline&lt;/code>
that owns the &lt;code>Scope&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/span>
&lt;ul>
&lt;li>I/O transforms: Beam comes with a number of &amp;ldquo;IOs&amp;rdquo; - library &lt;code>PTransform&lt;/code>s that
read or write data to various external storage systems.&lt;/li>
&lt;/ul>
&lt;span class="language-yaml">
Note that in Beam YAML, `PCollection`s are either implicit (e.g. when using
`chain`) or referenced by their producing `PTransform`.
&lt;/span>
&lt;p>A typical Beam driver program works as follows:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Create&lt;/strong> a &lt;code>Pipeline&lt;/code> object and set the pipeline execution options, including
the Pipeline Runner.&lt;/li>
&lt;li>Create an initial &lt;code>PCollection&lt;/code> for pipeline data, either using the IOs
to read data from an external storage system, or using a &lt;code>Create&lt;/code> transform to
build a &lt;code>PCollection&lt;/code> from in-memory data.&lt;/li>
&lt;li>&lt;strong>Apply&lt;/strong> &lt;code>PTransform&lt;/code>s to each &lt;code>PCollection&lt;/code>. Transforms can change, filter,
group, analyze, or otherwise process the elements in a &lt;code>PCollection&lt;/code>. A
transform creates a new output &lt;code>PCollection&lt;/code> &lt;em>without modifying the input
collection&lt;/em>. A typical pipeline applies subsequent transforms to each new
output &lt;code>PCollection&lt;/code> in turn until processing is complete. However, note that
a pipeline does not have to be a single straight line of transforms applied
one after another: think of &lt;code>PCollection&lt;/code>s as variables and &lt;code>PTransform&lt;/code>s as
functions applied to these variables: the shape of the pipeline can be an
arbitrarily complex processing graph.&lt;/li>
&lt;li>Use IOs to write the final, transformed &lt;code>PCollection&lt;/code>(s) to an external source.&lt;/li>
&lt;li>&lt;strong>Run&lt;/strong> the pipeline using the designated Pipeline Runner.&lt;/li>
&lt;/ul>
&lt;p>When you run your Beam driver program, the Pipeline Runner that you designate
constructs a &lt;strong>workflow graph&lt;/strong> of your pipeline based on the &lt;code>PCollection&lt;/code>
objects you&amp;rsquo;ve created and transforms that you&amp;rsquo;ve applied. That graph is then
executed using the appropriate distributed processing back-end, becoming an
asynchronous &amp;ldquo;job&amp;rdquo; (or equivalent) on that back-end.&lt;/p>
&lt;h2 id="creating-a-pipeline">2. Creating a pipeline&lt;/h2>
&lt;p>The &lt;code>Pipeline&lt;/code> abstraction encapsulates all the data and steps in your data
processing task. Your Beam driver program typically starts by constructing a
&lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/Pipeline.html">Pipeline&lt;/a>&lt;/span>
&lt;span class="language-py">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py">Pipeline&lt;/a>&lt;/span>
&lt;span class="language-go">&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pipeline.go#L62">Pipeline&lt;/a>&lt;/span>
object, and then using that object as the basis for creating the pipeline&amp;rsquo;s data
sets as &lt;code>PCollection&lt;/code>s and its operations as &lt;code>Transform&lt;/code>s.&lt;/p>
&lt;p>To use Beam, your driver program must first create an instance of the Beam SDK
class &lt;code>Pipeline&lt;/code> (typically in the &lt;code>main()&lt;/code> function). When you create your
&lt;code>Pipeline&lt;/code>, you&amp;rsquo;ll also need to set some &lt;strong>configuration options&lt;/strong>. You can set
your pipeline&amp;rsquo;s configuration options programmatically, but it&amp;rsquo;s often easier to
set the options ahead of time (or read them from the command line) and pass them
to the &lt;code>Pipeline&lt;/code> object when you create the object.&lt;/p>
&lt;span class="language-typescript">
A Pipeline in the Typescript API is simply a function that will be called
with a single `root` object and is passed to a Runner's `run` method.
&lt;/span>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Start by defining the options for the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Then create the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span> &lt;span class="c1"># build your pipeline here&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// beam.Init() is an initialization hook that must be called
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// near the beginning of main(), before creating a pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Init&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Create the Pipeline object and root scope.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">pipeline&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">scope&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipelineWithRoot&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="k">await&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">createRunner&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nx">run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">function&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Use root to build a pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">options&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>For a more in-depth tutorial on creating basic pipelines
in the Python SDK, please read and work through
&lt;a href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_basics_by_doing.ipynb">this colab notebook&lt;/a>.&lt;/p>
&lt;h3 id="configuring-pipeline-options">2.1. Configuring pipeline options&lt;/h3>
&lt;p>Use the pipeline options to configure different aspects of your pipeline, such
as the pipeline runner that will execute your pipeline and any runner-specific
configuration required by the chosen runner. Your pipeline options will
potentially include information such as your project ID or a location for
storing files.&lt;/p>
&lt;p class="language-java">When you run the pipeline on a runner of your choice, a copy of the
PipelineOptions will be available to your code. For example, if you add a PipelineOptions parameter
to a DoFn&amp;rsquo;s &lt;code>@ProcessElement&lt;/code> method, it will be populated by the system.&lt;/p>
&lt;h4 id="pipeline-options-cli">2.1.1. Setting PipelineOptions from command-line arguments&lt;/h4>
&lt;p class="language-java language-py">While you can configure your pipeline by creating a &lt;code>PipelineOptions&lt;/code> object and
setting the fields directly, the Beam SDKs include a command-line parser that
you can use to set fields in &lt;code>PipelineOptions&lt;/code> using command-line arguments.&lt;/p>
&lt;p class="language-java language-py">To read options from the command-line, construct your &lt;code>PipelineOptions&lt;/code> object
as demonstrated in the following example code:&lt;/p>
&lt;p class="language-go">Use Go flags to parse command line arguments to configure your pipeline. Flags must be parsed
before &lt;code>beam.Init()&lt;/code> is called.&lt;/p>
&lt;p class="language-typescript">Any Javascript object can be used as pipeline options.
One can either construct one manually, but it is also common to pass an object
created from command line options such as &lt;code>yargs.argv&lt;/code>.&lt;/p>
&lt;p class="language-yaml">Pipeline options are simply an optional YAML mapping property that is a sibling to
the pipeline definition itself.
It will be merged with whatever options are passed on the command line.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.options.pipeline_options&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">beam_options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// If beamx or Go flags are used, flags must be parsed first,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// before beam.Init() is called.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Parse&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pipeline_options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">runner&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;default&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">project&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;my_project&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">runner&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">createRunner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">pipeline_options&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">runnerFromCommandLineOptions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">createRunner&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">yargs&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">argv&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">options&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">my_pipeline_option&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">my_value&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>This interprets command-line arguments that follow the format:&lt;/p>
&lt;pre tabindex="0">&lt;code>--&amp;lt;option&amp;gt;=&amp;lt;value&amp;gt;
&lt;/code>&lt;/pre>&lt;span class="language-java">
&lt;blockquote>
&lt;p>Appending the method &lt;code>.withValidation&lt;/code> will check for required
command-line arguments and validate argument values.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;p class="language-java language-py">Building your &lt;code>PipelineOptions&lt;/code> this way lets you specify any of the options as
a command-line argument.&lt;/p>
&lt;p class="language-go">Defining flag variables this way lets you specify any of the options as a command-line argument.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> The &lt;a href="/get-started/wordcount-example">WordCount example pipeline&lt;/a>
demonstrates how to set pipeline options at runtime by using command-line
options.&lt;/p>
&lt;/blockquote>
&lt;h4 id="creating-custom-options">2.1.2. Creating custom options&lt;/h4>
&lt;p>You can add your own custom options in addition to the standard
&lt;code>PipelineOptions&lt;/code>.
&lt;p class="language-java">To add your own options, define an interface with getter and
setter methods for each option.&lt;/p>
The following example shows how to add &lt;code>input&lt;/code> and &lt;code>output&lt;/code> custom options:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">MyOptions&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getInput&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getOutput&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setOutput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.options.pipeline_options&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@classmethod&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">_add_argparse_args&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;--input&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;--output&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Use standard Go flags to define pipeline options.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">var&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">input&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;input&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">output&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;output&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">yargs&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">argv&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="c1">// Or an alternative command-line parsing library.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Use options.input and options.output during pipeline construction.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>You can also specify a description, which appears when a user passes &lt;code>--help&lt;/code> as
a command-line argument, and a default value.&lt;/p>
&lt;p class="language-java language-py language-go">You set the description and default value using annotations, as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">MyOptions&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Description&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Input for the pipeline&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Default.String&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://my-bucket/input&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getInput&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Description&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Output for the pipeline&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Default.String&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://my-bucket/output&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getOutput&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setOutput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.options.pipeline_options&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">PipelineOptions&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyOptions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PipelineOptions&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@classmethod&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">_add_argparse_args&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">cls&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">parser&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--input&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">default&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;gs://dataflow-samples/shakespeare/kinglear.txt&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The file path for the input text to process.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add_argument&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;--output&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">required&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">help&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The path prefix for output files.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">input&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;input&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gs://my-bucket/input&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Input for the pipeline&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">output&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">flag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">String&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;output&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gs://my-bucket/output&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;Output for the pipeline&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">For Python, you can also simply parse your custom options with argparse; there
is no need to create a separate PipelineOptions subclass.&lt;/p>
&lt;p class="language-java">It&amp;rsquo;s recommended that you register your interface with &lt;code>PipelineOptionsFactory&lt;/code>
and then pass the interface when creating the &lt;code>PipelineOptions&lt;/code> object. When you
register your interface with &lt;code>PipelineOptionsFactory&lt;/code>, the &lt;code>--help&lt;/code> can find
your custom options interface and add it to the output of the &lt;code>--help&lt;/code> command.
&lt;code>PipelineOptionsFactory&lt;/code> will also validate that your custom options are
compatible with all other registered options.&lt;/p>
&lt;p class="language-java">The following example code shows how to register your custom options interface
with &lt;code>PipelineOptionsFactory&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">register&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">MyOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Now your pipeline can accept &lt;code>--input=value&lt;/code> and &lt;code>--output=value&lt;/code> as command-line arguments.&lt;/p>
&lt;h2 id="pcollections">3. PCollections&lt;/h2>
&lt;p>The &lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/values/PCollection.html">PCollection&lt;/a>&lt;/span>
&lt;span class="language-py">&lt;code>PCollection&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pcollection.go#L39">PCollection&lt;/a>&lt;/span>
abstraction represents a
potentially distributed, multi-element data set. You can think of a
&lt;code>PCollection&lt;/code> as &amp;ldquo;pipeline&amp;rdquo; data; Beam transforms use &lt;code>PCollection&lt;/code> objects as
inputs and outputs. As such, if you want to work with data in your pipeline, it
must be in the form of a &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>After you&amp;rsquo;ve created your &lt;code>Pipeline&lt;/code>, you&amp;rsquo;ll need to begin by creating at least
one &lt;code>PCollection&lt;/code> in some form. The &lt;code>PCollection&lt;/code> you create serves as the input
for the first operation in your pipeline.&lt;/p>
&lt;span class="language-yaml">
In Beam YAML, `PCollection`s are either implicit (e.g. when using `chain`)
or referred to by their producing `PTransform`.
&lt;/span>
&lt;h3 id="creating-a-pcollection">3.1. Creating a PCollection&lt;/h3>
&lt;p>You create a &lt;code>PCollection&lt;/code> by either reading data from an external source using
Beam&amp;rsquo;s &lt;a href="#pipeline-io">Source API&lt;/a>, or you can create a &lt;code>PCollection&lt;/code> of data
stored in an in-memory collection class in your driver program. The former is
typically how a production pipeline would ingest data; Beam&amp;rsquo;s Source APIs
contain adapters to help you read from external sources like large cloud-based
files, databases, or subscription services. The latter is primarily useful for
testing and debugging purposes.&lt;/p>
&lt;h4 id="reading-external-source">3.1.1. Reading from an external source&lt;/h4>
&lt;p>To read from an external source, you use one of the &lt;a href="#pipeline-io">Beam-provided I/O
adapters&lt;/a>. The adapters vary in their exact usage, but all of them
read from some external data source and return a &lt;code>PCollection&lt;/code> whose elements
represent the data records in that source.&lt;/p>
&lt;p>Each data source adapter has a &lt;code>Read&lt;/code> transform; to read,
&lt;span class="language-java language-py language-go language-typescript">
you must apply that transform to the &lt;code>Pipeline&lt;/code> object itself.
&lt;/span>
&lt;span class="language-yaml">
place this transform in the &lt;code>source&lt;/code> or &lt;code>transforms&lt;/code> portion of the pipeline.
&lt;/span>
&lt;span class="language-java">&lt;code>TextIO.Read&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>io.TextFileSource&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>textio.Read&lt;/code>&lt;/span>
&lt;span class="language-typescript">&lt;code>textio.ReadFromText&lt;/code>&lt;/span>,
&lt;span class="language-yaml">&lt;code>ReadFromText&lt;/code>&lt;/span>,
for example, reads from an
external text file and returns a &lt;code>PCollection&lt;/code> whose elements
&lt;span class="language-java language-py language-go language-typescript">
are of type &lt;code>String&lt;/code> where each &lt;code>String&lt;/code>
&lt;/span>
represents one line from the text file. Here&amp;rsquo;s how you
would apply &lt;span class="language-java">&lt;code>TextIO.Read&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>io.TextFileSource&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>textio.Read&lt;/code>&lt;/span>
&lt;span class="language-typescript">&lt;code>textio.ReadFromText&lt;/code>&lt;/span>
&lt;span class="language-yaml">&lt;code>ReadFromText&lt;/code>&lt;/span>
to your &lt;code>Pipeline&lt;/code> &lt;span class="language-typescript">root&lt;/span> to create
a &lt;code>PCollection&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create the PCollection &amp;#39;lines&amp;#39; by applying a &amp;#39;Read&amp;#39; transform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;ReadMyFile&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://some/inputData.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ReadMyFile&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;gs://some/inputData.txt&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Read the file at the URI &amp;#39;gs://some/inputData.txt&amp;#39; and return
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// the lines as a PCollection&amp;lt;string&amp;gt;.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Notice the scope as the first variable when calling
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// the method as is needed when calling all transforms.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gs://some/inputData.txt&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">async&lt;/span> &lt;span class="kd">function&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>: &lt;span class="kt">beam.Root&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Note that textio.ReadFromText is an AsyncPTransform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">await&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">applyAsync&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;path/to/text_pattern&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">source&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ReadFromText&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">path&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>See the &lt;a href="#pipeline-io">section on I/O&lt;/a> to learn more about how to read from the
various data sources supported by the Beam SDK.&lt;/p>
&lt;h4 id="creating-pcollection-in-memory">3.1.2. Creating a PCollection from in-memory data&lt;/h4>
&lt;p class="language-java">To create a &lt;code>PCollection&lt;/code> from an in-memory Java &lt;code>Collection&lt;/code>, you use the
Beam-provided &lt;code>Create&lt;/code> transform. Much like a data adapter&amp;rsquo;s &lt;code>Read&lt;/code>, you apply
&lt;code>Create&lt;/code> directly to your &lt;code>Pipeline&lt;/code> object itself.&lt;/p>
&lt;p class="language-java">As parameters, &lt;code>Create&lt;/code> accepts the Java &lt;code>Collection&lt;/code> and a &lt;code>Coder&lt;/code> object. The
&lt;code>Coder&lt;/code> specifies how the elements in the &lt;code>Collection&lt;/code> should be
&lt;a href="#element-type">encoded&lt;/a>.&lt;/p>
&lt;p class="language-py">To create a &lt;code>PCollection&lt;/code> from an in-memory &lt;code>list&lt;/code>, you use the Beam-provided
&lt;code>Create&lt;/code> transform. Apply this transform directly to your &lt;code>Pipeline&lt;/code> object
itself.&lt;/p>
&lt;p class="language-go">To create a &lt;code>PCollection&lt;/code> from an in-memory &lt;code>slice&lt;/code>, you use the Beam-provided
&lt;code>beam.CreateList&lt;/code> transform. Pass the pipeline &lt;code>scope&lt;/code>, and the &lt;code>slice&lt;/code> to this transform.&lt;/p>
&lt;p class="language-typescript">To create a &lt;code>PCollection&lt;/code> from an in-memory &lt;code>array&lt;/code>, you use the Beam-provided
&lt;code>Create&lt;/code> transform. Apply this transform directly to your &lt;code>Root&lt;/code> object.&lt;/p>
&lt;p class="language-yaml">To create a &lt;code>PCollection&lt;/code> from an in-memory &lt;code>array&lt;/code>, you use the Beam-provided
&lt;code>Create&lt;/code> transform. Specify the elements in the pipeline itself.&lt;/p>
&lt;p>The following example code shows how to create a &lt;code>PCollection&lt;/code> from an in-memory
&lt;span class="language-java">&lt;code>List&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>list&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>slice&lt;/code>&lt;/span>
&lt;span class="language-typescript language-yaml">&lt;code>array&lt;/code>&lt;/span>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">main&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">args&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create a Java Collection, in this case a List of Strings.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">LINES&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;To be, or not to be: that is the question: &amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;Whether &amp;#39;tis nobler in the mind to suffer &amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;The slings and arrows of outrageous fortune, &amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;Or to take arms against a sea of troubles, &amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Apply Create, passing the list and the coder, to create the PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">LINES&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">setCoder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">StringUtf8Coder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;To be, or not to be: that is the question: &amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;Whether &amp;#39;tis nobler in the mind to suffer &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;The slings and arrows of outrageous fortune, &amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;Or to take arms against a sea of troubles, &amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;To be, or not to be: that is the question: &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;Whether &amp;#39;tis nobler in the mind to suffer &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;The slings and arrows of outrageous fortune, &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;Or to take arms against a sea of troubles, &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Create the Pipeline object and root scope.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// It&amp;#39;s conventional to use p as the Pipeline variable and
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// s as the scope variable.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">s&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewPipelineWithRoot&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Pass the slice to beam.CreateList, to create the pcollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The scope variable s is used to add the CreateList transform
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// to the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">linesPCol&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CreateList&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">function&lt;/span> &lt;span class="nx">pipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">root&lt;/span>: &lt;span class="kt">beam.Root&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;To be, or not to be: that is the question: &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;Whether &amp;#39;tis nobler in the mind to suffer &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;The slings and arrows of outrageous fortune, &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;Or to take arms against a sea of troubles, &amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">transforms&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Create&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">elements&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="l">A&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="l">B&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="pcollection-characteristics">3.2. PCollection characteristics&lt;/h3>
&lt;p>A &lt;code>PCollection&lt;/code> is owned by the specific &lt;code>Pipeline&lt;/code> object for which it is
created; multiple pipelines cannot share a &lt;code>PCollection&lt;/code>.
&lt;span class="language-java">In some respects, a &lt;code>PCollection&lt;/code> functions like
a &lt;code>Collection&lt;/code> class. However, a &lt;code>PCollection&lt;/code> can differ in a few key ways:&lt;/span>&lt;/p>
&lt;h4 id="element-type">3.2.1. Element type&lt;/h4>
&lt;p>The elements of a &lt;code>PCollection&lt;/code> may be of any type, but must all be of the same
type. However, to support distributed processing, Beam needs to be able to
encode each individual element as a byte string (so elements can be passed
around to distributed workers). The Beam SDKs provide a data encoding mechanism
that includes built-in encoding for commonly-used types as well as support for
specifying custom encodings as needed.&lt;/p>
&lt;h4 id="element-schema">3.2.2. Element schema&lt;/h4>
&lt;p>In many cases, the element type in a &lt;code>PCollection&lt;/code> has a structure that can be introspected.
Examples are JSON, Protocol Buffer, Avro, and database records. Schemas provide a way to
express types as a set of named fields, allowing for more-expressive aggregations.&lt;/p>
&lt;h4 id="immutability">3.2.3. Immutability&lt;/h4>
&lt;p>A &lt;code>PCollection&lt;/code> is immutable. Once created, you cannot add, remove, or change
individual elements. A Beam Transform might process each element of a
&lt;code>PCollection&lt;/code> and generate new pipeline data (as a new &lt;code>PCollection&lt;/code>), &lt;em>but it
does not consume or modify the original input collection&lt;/em>.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Beam SDKs avoid unnecessary copying of elements, so &lt;code>PCollection&lt;/code>
contents are logically immutable, not physically immutable. Changes to input
elements may be visible to other DoFns executing within the same bundle, and may
cause correctness issues.
As a rule, it&amp;rsquo;s not safe to modify values provided to a DoFn.&lt;/p>
&lt;/blockquote>
&lt;h4 id="random-access">3.2.4. Random access&lt;/h4>
&lt;p>A &lt;code>PCollection&lt;/code> does not support random access to individual elements. Instead,
Beam Transforms consider every element in a &lt;code>PCollection&lt;/code> individually.&lt;/p>
&lt;h4 id="size-and-boundedness">3.2.5. Size and boundedness&lt;/h4>
&lt;p>A &lt;code>PCollection&lt;/code> is a large, immutable &amp;ldquo;bag&amp;rdquo; of elements. There is no upper limit
on how many elements a &lt;code>PCollection&lt;/code> can contain; any given &lt;code>PCollection&lt;/code> might
fit in memory on a single machine, or it might represent a very large
distributed data set backed by a persistent data store.&lt;/p>
&lt;p>A &lt;code>PCollection&lt;/code> can be either &lt;strong>bounded&lt;/strong> or &lt;strong>unbounded&lt;/strong> in size. A
&lt;strong>bounded&lt;/strong> &lt;code>PCollection&lt;/code> represents a data set of a known, fixed size, while an
&lt;strong>unbounded&lt;/strong> &lt;code>PCollection&lt;/code> represents a data set of unlimited size. Whether a
&lt;code>PCollection&lt;/code> is bounded or unbounded depends on the source of the data set that
it represents. Reading from a batch data source, such as a file or a database,
creates a bounded &lt;code>PCollection&lt;/code>. Reading from a streaming or
continuously-updating data source, such as Pub/Sub or Kafka, creates an unbounded
&lt;code>PCollection&lt;/code> (unless you explicitly tell it not to).&lt;/p>
&lt;p>The bounded (or unbounded) nature of your &lt;code>PCollection&lt;/code> affects how Beam
processes your data. A bounded &lt;code>PCollection&lt;/code> can be processed using a batch job,
which might read the entire data set once, and perform processing in a job of
finite length. An unbounded &lt;code>PCollection&lt;/code> must be processed using a streaming
job that runs continuously, as the entire collection can never be available for
processing at any one time.&lt;/p>
&lt;p>Beam uses &lt;a href="#windowing">windowing&lt;/a> to divide a continuously updating unbounded
&lt;code>PCollection&lt;/code> into logical windows of finite size. These logical windows are
determined by some characteristic associated with a data element, such as a
&lt;strong>timestamp&lt;/strong>. Aggregation transforms (such as &lt;code>GroupByKey&lt;/code> and &lt;code>Combine&lt;/code>) work
on a per-window basis — as the data set is generated, they process each
&lt;code>PCollection&lt;/code> as a succession of these finite windows.&lt;/p>
&lt;h4 id="element-timestamps">3.2.6. Element timestamps&lt;/h4>
&lt;p>Each element in a &lt;code>PCollection&lt;/code> has an associated intrinsic &lt;strong>timestamp&lt;/strong>. The
timestamp for each element is initially assigned by the &lt;a href="#pipeline-io">Source&lt;/a>
that creates the &lt;code>PCollection&lt;/code>. Sources that create an unbounded &lt;code>PCollection&lt;/code>
often assign each new element a timestamp that corresponds to when the element
was read or added.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note&lt;/strong>: Sources that create a bounded &lt;code>PCollection&lt;/code> for a fixed data set
also automatically assign timestamps, but the most common behavior is to
assign every element the same timestamp (&lt;code>Long.MIN_VALUE&lt;/code>).&lt;/p>
&lt;/blockquote>
&lt;p>Timestamps are useful for a &lt;code>PCollection&lt;/code> that contains elements with an
inherent notion of time. If your pipeline is reading a stream of events, like
Tweets or other social media messages, each element might use the time the event
was posted as the element timestamp.&lt;/p>
&lt;p>You can manually assign timestamps to the elements of a &lt;code>PCollection&lt;/code> if the
source doesn&amp;rsquo;t do it for you. You&amp;rsquo;ll want to do this if the elements have an
inherent timestamp, but the timestamp is somewhere in the structure of the
element itself (such as a &amp;ldquo;time&amp;rdquo; field in a server log entry). Beam has
&lt;a href="#transforms">Transforms&lt;/a> that take a &lt;code>PCollection&lt;/code> as input and output an
identical &lt;code>PCollection&lt;/code> with timestamps attached; see &lt;a href="#adding-timestamps-to-a-pcollections-elements">Adding
Timestamps&lt;/a> for more information
about how to do so.&lt;/p>
&lt;h2 id="transforms">4. Transforms&lt;/h2>
&lt;p>Transforms are the operations in your pipeline, and provide a generic
processing framework. You provide processing logic in the form of a function
object (colloquially referred to as &amp;ldquo;user code&amp;rdquo;), and your user code is applied
to each element of an input &lt;code>PCollection&lt;/code> (or more than one &lt;code>PCollection&lt;/code>).
Depending on the pipeline runner and back-end that you choose, many different
workers across a cluster may execute instances of your user code in parallel.
The user code running on each worker generates the output elements that are
ultimately added to the final output &lt;code>PCollection&lt;/code> that the transform produces.&lt;/p>
&lt;blockquote>
&lt;p>Aggregation is an important concept to understand when learning about Beam&amp;rsquo;s
transforms. For an introduction to aggregation, see the Basics of the Beam
model &lt;a href="/documentation/basics/#aggregation">Aggregation section&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;p>The Beam SDKs contain a number of different transforms that you can apply to
your pipeline&amp;rsquo;s &lt;code>PCollection&lt;/code>s. These include general-purpose core transforms,
such as &lt;a href="#pardo">ParDo&lt;/a> or &lt;a href="#combine">Combine&lt;/a>. There are also pre-written
&lt;a href="#composite-transforms">composite transforms&lt;/a> included in the SDKs, which
combine one or more of the core transforms in a useful processing pattern, such
as counting or combining elements in a collection. You can also define your own
more complex composite transforms to fit your pipeline&amp;rsquo;s exact use case.&lt;/p>
&lt;p>For a more in-depth tutorial of applying various transforms
in the Python SDK, please read and work through
&lt;a href="https://colab.sandbox.google.com/github/liferoad/beam/blob/learn-transforms/examples/notebooks/get-started/learn_beam_transforms_by_doing.ipynb">this colab notebook&lt;/a>.&lt;/p>
&lt;h3 id="applying-transforms">4.1. Applying transforms&lt;/h3>
&lt;p>To invoke a transform, you must &lt;strong>apply&lt;/strong> it to the input &lt;code>PCollection&lt;/code>. Each
transform in the Beam SDKs has a generic &lt;code>apply&lt;/code> method
&lt;span class="language-py">(or pipe operator &lt;code>|&lt;/code>)&lt;/span>.
Invoking multiple Beam transforms is similar to &lt;em>method chaining&lt;/em>, but with one
slight difference: You apply the transform to the input &lt;code>PCollection&lt;/code>, passing
the transform itself as an argument, and the operation returns the output
&lt;code>PCollection&lt;/code>.
&lt;span class="language-yaml">&lt;code>array&lt;/code>&lt;/span>
In YAML, transforms are applied by listing their inputs.
&lt;/span>
This takes the general form:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">[&lt;/span>&lt;span class="n">Output&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>&lt;span class="n">Input&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">].&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Transform&lt;/span>&lt;span class="o">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">Output&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Input&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Transform&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Output&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Transform&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Output&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Transform&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Output&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">await&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">applyAsync&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">AsyncTransform&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">transforms&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-yaml">If a transform has more than one
(&lt;a href="https://beam.apache.org/documentation/sdks/yaml-errors/">non-error&lt;/a>) output,
the various outputs can be identified by explicitly giving the output name.&lt;/p>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">transforms&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform.output_name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform.another_output_name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>
&lt;p class="language-yaml">For linear pipelines, this can be further simplified by implicitly determining
the inputs based on by the ordering of the transforms by designating and setting
the type to &lt;code>chain&lt;/code>. For example&lt;/p>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">pipeline&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">chain&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">transforms&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ProducingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ReadTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MyTransformType&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ConsumingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">WriteTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;p class="language-java language-py language-typescript">Because Beam uses a generic &lt;code>apply&lt;/code> method for &lt;code>PCollection&lt;/code>, you can both chain
transforms sequentially and also apply transforms that contain other transforms
nested within (called &lt;a href="#composite-transforms">composite transforms&lt;/a> in the Beam
SDKs).&lt;/p>
&lt;p class="language-go">It&amp;rsquo;s recommended to create a new variable for each new &lt;code>PCollection&lt;/code> to
sequentially transform input data. &lt;code>Scope&lt;/code>s can be used to create functions
that contain other transforms
(called &lt;a href="#composite-transforms">composite transforms&lt;/a> in the Beam SDKs).&lt;/p>
&lt;p>How you apply your pipeline&amp;rsquo;s transforms determines the structure of your
pipeline. The best way to think of your pipeline is as a directed acyclic graph,
where &lt;code>PTransform&lt;/code> nodes are subroutines that accept &lt;code>PCollection&lt;/code> nodes as
inputs and emit &lt;code>PCollection&lt;/code> nodes as outputs.
&lt;span class="language-java language-py">
For example, you can chain together transforms to create a pipeline that successively modifies input data:
&lt;/span>
&lt;span class="language-go">
For example, you can successively call transforms on PCollections to modify the input data:
&lt;/span>&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">[&lt;/span>&lt;span class="n">Final&lt;/span> &lt;span class="n">Output&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>&lt;span class="n">Initial&lt;/span> &lt;span class="n">Input&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">].&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">First&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Second&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Third&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="o">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">Final&lt;/span> &lt;span class="n">Output&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">([&lt;/span>&lt;span class="n">Initial&lt;/span> &lt;span class="n">Input&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">First&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Second&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Third&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Second&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">First&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Initial&lt;/span> &lt;span class="nx">Input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Third&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Second&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Second&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Final&lt;/span> &lt;span class="nx">Output&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Third&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Third&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">Final&lt;/span> &lt;span class="nx">Output&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Initial&lt;/span> &lt;span class="nx">Input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">First&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Second&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Third&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The graph of this pipeline looks like the following:&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-linear.svg" alt="This linear pipeline starts with one input collection, sequentially appliesthree transforms, and ends with one output collection.">&lt;/p>
&lt;p>&lt;em>Figure 1: A linear pipeline with three sequential transforms.&lt;/em>&lt;/p>
&lt;p>However, note that a transform &lt;em>does not consume or otherwise alter&lt;/em> the input
collection — remember that a &lt;code>PCollection&lt;/code> is immutable by definition. This means
that you can apply multiple transforms to the same input &lt;code>PCollection&lt;/code> to create
a branching pipeline, like so:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>&lt;span class="n">Database&lt;/span> &lt;span class="n">Table&lt;/span> &lt;span class="n">Reader&lt;/span>&lt;span class="o">].&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Read&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="sc">&amp;#39;A&amp;#39;&lt;/span> &lt;span class="n">names&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="o">].&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Transform&lt;/span> &lt;span class="n">A&lt;/span>&lt;span class="o">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="sc">&amp;#39;B&amp;#39;&lt;/span> &lt;span class="n">names&lt;/span>&lt;span class="o">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="o">].&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">([&lt;/span>&lt;span class="n">Transform&lt;/span> &lt;span class="n">B&lt;/span>&lt;span class="o">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Database&lt;/span> &lt;span class="n">Table&lt;/span> &lt;span class="n">Reader&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Read&lt;/span> &lt;span class="n">Transform&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="s1">&amp;#39;A&amp;#39;&lt;/span> &lt;span class="n">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Transform&lt;/span> &lt;span class="n">A&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="s1">&amp;#39;B&amp;#39;&lt;/span> &lt;span class="n">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">PCollection&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">database&lt;/span> &lt;span class="n">table&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Transform&lt;/span> &lt;span class="n">B&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Read&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Database&lt;/span> &lt;span class="nx">Table&lt;/span> &lt;span class="nx">Reader&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="sc">&amp;#39;A&amp;#39;&lt;/span> &lt;span class="nx">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Transform&lt;/span> &lt;span class="nx">A&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="sc">&amp;#39;B&amp;#39;&lt;/span> &lt;span class="nx">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Transform&lt;/span> &lt;span class="nx">B&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">Database&lt;/span> &lt;span class="nx">Table&lt;/span> &lt;span class="nx">Reader&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Read&lt;/span> &lt;span class="nx">Transform&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="s1">&amp;#39;A&amp;#39;&lt;/span> &lt;span class="nx">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Transform&lt;/span> &lt;span class="nx">A&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="s1">&amp;#39;B&amp;#39;&lt;/span> &lt;span class="nx">names&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="nx">database&lt;/span> &lt;span class="nx">table&lt;/span> &lt;span class="nx">rows&lt;/span>&lt;span class="p">].&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">Transform&lt;/span> &lt;span class="nx">B&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The graph of this branching pipeline looks like the following:&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-multiple-pcollections.svg" alt="This pipeline applies two transforms to a single input collection. Eachtransform produces an output collection.">&lt;/p>
&lt;p>&lt;em>Figure 2: A branching pipeline. Two transforms are applied to a single
PCollection of database table rows.&lt;/em>&lt;/p>
&lt;p>You can also build your own &lt;a href="#composite-transforms">composite transforms&lt;/a> that
nest multiple transforms inside a single, larger transform. Composite transforms
are particularly useful for building a reusable sequence of simple steps that
get used in a lot of different places.&lt;/p>
&lt;p class="language-python">The pipe syntax allows one to apply PTransforms to &lt;code>tuple&lt;/code>s and &lt;code>dict&lt;/code>s of
PCollections as well for those transforms accepting multiple inputs (such as
&lt;code>Flatten&lt;/code> and &lt;code>CoGroupByKey&lt;/code>).&lt;/p>
&lt;p class="language-typescript">PTransforms can also be applied to any &lt;code>PValue&lt;/code>, which include the Root object,
PCollections, arrays of &lt;code>PValue&lt;/code>s, and objects with &lt;code>PValue&lt;/code> values.
One can apply transforms to these composite types by wrapping them with
&lt;code>beam.P&lt;/code>, e.g.
&lt;code>beam.P({left: pcollA, right: pcollB}).apply(transformExpectingTwoPCollections)&lt;/code>.&lt;/p>
&lt;p class="language-typescript">PTransforms come in two flavors, synchronous and asynchronous, depending on
whether their &lt;em>application&lt;/em>* involves asynchronous invocations.
An &lt;code>AsyncTransform&lt;/code> must be applied with &lt;code>applyAsync&lt;/code> and returns a &lt;code>Promise&lt;/code>
which must be awaited before further pipeline construction.&lt;/p>
&lt;h3 id="core-beam-transforms">4.2. Core Beam transforms&lt;/h3>
&lt;p>Beam provides the following core transforms, each of which represents a different
processing paradigm:&lt;/p>
&lt;ul>
&lt;li>&lt;code>ParDo&lt;/code>&lt;/li>
&lt;li>&lt;code>GroupByKey&lt;/code>&lt;/li>
&lt;li>&lt;code>CoGroupByKey&lt;/code>&lt;/li>
&lt;li>&lt;code>Combine&lt;/code>&lt;/li>
&lt;li>&lt;code>Flatten&lt;/code>&lt;/li>
&lt;li>&lt;code>Partition&lt;/code>&lt;/li>
&lt;/ul>
&lt;p class="language-typescript">The Typescript SDK provides some of the most basic of these transforms
as methods on &lt;code>PCollection&lt;/code> itself.&lt;/p>
&lt;h4 id="pardo">4.2.1. ParDo&lt;/h4>
&lt;p>&lt;code>ParDo&lt;/code> is a Beam transform for generic parallel processing. The &lt;code>ParDo&lt;/code>
processing paradigm is similar to the &amp;ldquo;Map&amp;rdquo; phase of a
&lt;a href="https://en.wikipedia.org/wiki/MapReduce">Map/Shuffle/Reduce&lt;/a>-style
algorithm: a &lt;code>ParDo&lt;/code> transform considers each element in the input
&lt;code>PCollection&lt;/code>, performs some processing function (your user code) on that
element, and emits zero, one, or multiple elements to an output &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;code>ParDo&lt;/code> is useful for a variety of common data processing operations, including:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Filtering a data set.&lt;/strong> You can use &lt;code>ParDo&lt;/code> to consider each element in a
&lt;code>PCollection&lt;/code> and either output that element to a new collection or discard
it.&lt;/li>
&lt;li>&lt;strong>Formatting or type-converting each element in a data set.&lt;/strong> If your input
&lt;code>PCollection&lt;/code> contains elements that are of a different type or format than
you want, you can use &lt;code>ParDo&lt;/code> to perform a conversion on each element and
output the result to a new &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Extracting parts of each element in a data set.&lt;/strong> If you have a
&lt;code>PCollection&lt;/code> of records with multiple fields, for example, you can use a
&lt;code>ParDo&lt;/code> to parse out just the fields you want to consider into a new
&lt;code>PCollection&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Performing computations on each element in a data set.&lt;/strong> You can use &lt;code>ParDo&lt;/code>
to perform simple or complex computations on every element, or certain
elements, of a &lt;code>PCollection&lt;/code> and output the results as a new &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>In such roles, &lt;code>ParDo&lt;/code> is a common intermediate step in a pipeline. You might
use it to extract certain fields from a set of raw input records, or convert raw
input into a different format; you might also use &lt;code>ParDo&lt;/code> to convert processed
data into a format suitable for output, like database table rows or printable
strings.&lt;/p>
&lt;p class="language-java language-go language-py language-typescript">When you apply a &lt;code>ParDo&lt;/code> transform, you&amp;rsquo;ll need to provide user code in the form
of a &lt;code>DoFn&lt;/code> object. &lt;code>DoFn&lt;/code> is a Beam SDK class that defines a distributed
processing function.&lt;/p>
&lt;p class="language-yaml">In Beam YAML, &lt;code>ParDo&lt;/code> operations are expressed by the &lt;code>MapToFields&lt;/code>, &lt;code>Filter&lt;/code>,
and &lt;code>Explode&lt;/code> transform types. These types can take a UDF in the language of your
choice, rather than introducing the notion of a &lt;code>DoFn&lt;/code>.
See &lt;a href="https://beam.apache.org/documentation/sdks/yaml-udf/">the page on mapping fns&lt;/a> for more details.&lt;/p>
&lt;span class="language-java language-py">
&lt;blockquote>
&lt;p>When you create a subclass of &lt;code>DoFn&lt;/code>, note that your subclass should adhere to
the &lt;a href="#requirements-for-writing-user-code-for-beam-transforms">Requirements for writing user code for Beam transforms&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;p class="language-go">All DoFns should be registered using a generic &lt;code>register.DoFnXxY[...]&lt;/code>
function. This allows the Go SDK to infer an encoding from any inputs/outputs,
registers the DoFn for execution on remote runners, and optimizes the runtime
execution of the DoFns via reflection.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ComputeWordLengthFn is a DoFn that computes the word length of string elements.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">ComputeWordLengthFn&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ProcessElement computes the length of word and emits the result.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// When creating structs as a DoFn, the ProcessElement method performs the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// work of this step in the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 2 inputs and 1 output =&amp;gt; DoFn2x1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Input/output types are included in order in the brackets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn2x1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="applying-pardo">4.2.1.1. Applying ParDo&lt;/h5>
&lt;p class="language-java">Like all Beam transforms, you apply &lt;code>ParDo&lt;/code> by calling the &lt;code>apply&lt;/code> method on the
input &lt;code>PCollection&lt;/code> and passing &lt;code>ParDo&lt;/code> as an argument, as shown in the
following example code:&lt;/p>
&lt;p class="language-py">Like all Beam transforms, you apply &lt;code>ParDo&lt;/code> by calling the &lt;code>beam.ParDo&lt;/code> on the
input &lt;code>PCollection&lt;/code> and passing the &lt;code>DoFn&lt;/code> as an argument, as shown in the
following example code:&lt;/p>
&lt;p class="language-go">&lt;code>beam.ParDo&lt;/code> applies the passed in &lt;code>DoFn&lt;/code> argument to the input &lt;code>PCollection&lt;/code>,
as shown in the following example code:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection of Strings.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The DoFn to perform on each element in the input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ComputeWordLengthFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Apply a ParDo to the PCollection &amp;#34;words&amp;#34; to compute lengths for each word.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordLengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ComputeWordLengthFn&lt;/span>&lt;span class="o">()));&lt;/span> &lt;span class="c1">// The DoFn to perform on each element, which
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// we define above.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The input PCollection of Strings.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The DoFn to perform on each element in the input PCollection.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ComputeWordLengthFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Apply a ParDo to the PCollection &amp;#34;words&amp;#34; to compute lengths for each word.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_lengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ComputeWordLengthFn&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ComputeWordLengthFn is the DoFn to perform on each element in the input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">ComputeWordLengthFn&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ProcessElement is the method to execute for each element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// DoFns must be registered with beam.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 2 inputs and 0 outputs =&amp;gt; DoFn2x0
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// 1 input =&amp;gt; Emitter1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Input/output types are included in order in the brackets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn2x0&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">)](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// words is an input PCollection of strings
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">var&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">wordLengths&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{},&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="err">#&lt;/span> &lt;span class="nx">The&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="nx">PCollection&lt;/span> &lt;span class="k">of&lt;/span> &lt;span class="nx">Strings&lt;/span>&lt;span class="p">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">words&lt;/span> : &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">#&lt;/span> &lt;span class="nx">The&lt;/span> &lt;span class="nx">DoFn&lt;/span> &lt;span class="nx">to&lt;/span> &lt;span class="nx">perform&lt;/span> &lt;span class="nx">on&lt;/span> &lt;span class="nx">each&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="k">in&lt;/span> &lt;span class="nx">the&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">function&lt;/span> &lt;span class="nx">computeWordLengthFn&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="err">,&lt;/span> &lt;span class="na">number&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">process&lt;/span>: &lt;span class="kt">function&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">parDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">computeWordLengthFn&lt;/span>&lt;span class="p">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>In the example, our input &lt;code>PCollection&lt;/code> contains &lt;span class="language-java language-py">&lt;code>String&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>string&lt;/code>&lt;/span> values. We apply a
&lt;code>ParDo&lt;/code> transform that specifies a function (&lt;code>ComputeWordLengthFn&lt;/code>) to compute
the length of each string, and outputs the result to a new &lt;code>PCollection&lt;/code> of
&lt;span class="language-java language-py">&lt;code>Integer&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>int&lt;/code>&lt;/span> values that stores the length of each word.&lt;/p>
&lt;h5 id="4212-creating-a-dofn">4.2.1.2. Creating a DoFn&lt;/h5>
&lt;p>The &lt;code>DoFn&lt;/code> object that you pass to &lt;code>ParDo&lt;/code> contains the processing logic that
gets applied to the elements in the input collection. When you use Beam, often
the most important pieces of code you&amp;rsquo;ll write are these &lt;code>DoFn&lt;/code>s - they&amp;rsquo;re what
define your pipeline&amp;rsquo;s exact data processing tasks.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> When you create your &lt;code>DoFn&lt;/code>, be mindful of the &lt;a href="#requirements-for-writing-user-code-for-beam-transforms">Requirements
for writing user code for Beam transforms&lt;/a>
and ensure that your code follows them.&lt;/p>
&lt;/blockquote>
&lt;p class="language-java">A &lt;code>DoFn&lt;/code> processes one element at a time from the input &lt;code>PCollection&lt;/code>. When you
create a subclass of &lt;code>DoFn&lt;/code>, you&amp;rsquo;ll need to provide type parameters that match
the types of the input and output elements. If your &lt;code>DoFn&lt;/code> processes incoming
&lt;code>String&lt;/code> elements and produces &lt;code>Integer&lt;/code> elements for the output collection
(like our previous example, &lt;code>ComputeWordLengthFn&lt;/code>), your class declaration would
look like this:&lt;/p>
&lt;p class="language-go">A &lt;code>DoFn&lt;/code> processes one element at a time from the input &lt;code>PCollection&lt;/code>. When you
create a &lt;code>DoFn&lt;/code> struct, you&amp;rsquo;ll need to provide type parameters that match
the types of the input and output elements in a ProcessElement method.
If your &lt;code>DoFn&lt;/code> processes incoming &lt;code>string&lt;/code> elements and produces &lt;code>int&lt;/code> elements
for the output collection (like our previous example, &lt;code>ComputeWordLengthFn&lt;/code>), your dofn could
look like this:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ComputeWordLengthFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ComputeWordLengthFn is a DoFn that computes the word length of string elements.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">ComputeWordLengthFn&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ProcessElement computes the length of word and emits the result.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// When creating structs as a DoFn, the ProcessElement method performs the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// work of this step in the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 2 inputs and 0 outputs =&amp;gt; DoFn2x0
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// 1 input =&amp;gt; Emitter1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Input/output types are included in order in the brackets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function2x0&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Inside your &lt;code>DoFn&lt;/code> subclass, you&amp;rsquo;ll write a method annotated with
&lt;code>@ProcessElement&lt;/code> where you provide the actual processing logic. You don&amp;rsquo;t need
to manually extract the elements from the input collection; the Beam SDKs handle
that for you. Your &lt;code>@ProcessElement&lt;/code> method should accept a parameter tagged with
&lt;code>@Element&lt;/code>, which will be populated with the input element. In order to output
elements, the method can also take a parameter of type &lt;code>OutputReceiver&lt;/code> which
provides a method for emitting elements. The parameter types must match the input
and output types of your &lt;code>DoFn&lt;/code> or the framework will raise an error. Note: &lt;code>@Element&lt;/code> and
&lt;code>OutputReceiver&lt;/code> were introduced in Beam 2.5.0; if using an earlier release of Beam, a
&lt;code>ProcessContext&lt;/code> parameter should be used instead.&lt;/p>
&lt;p class="language-py language-typescript">Inside your &lt;code>DoFn&lt;/code> subclass, you&amp;rsquo;ll write a method &lt;code>process&lt;/code> where you provide
the actual processing logic. You don&amp;rsquo;t need to manually extract the elements
from the input collection; the Beam SDKs handle that for you. Your &lt;code>process&lt;/code> method
should accept an argument &lt;code>element&lt;/code>, which is the input element, and return an
iterable with its output values. You can accomplish this by emitting individual
elements with &lt;code>yield&lt;/code> statements, and use &lt;code>yield from&lt;/code> to emit all elements from
an iterable, such as a list or a generator. Using &lt;code>return&lt;/code> statement
with an iterable is also acceptable as long as you don&amp;rsquo;t mix &lt;code>yield&lt;/code> and
&lt;code>return&lt;/code> statements in the same &lt;code>process&lt;/code> method, since that leads to &lt;a href="https://github.com/apache/beam/issues/22969">incorrect behavior&lt;/a>.&lt;/p>
&lt;p class="language-go">For your &lt;code>DoFn&lt;/code> type, you&amp;rsquo;ll write a method &lt;code>ProcessElement&lt;/code> where you provide
the actual processing logic. You don&amp;rsquo;t need to manually extract the elements
from the input collection; the Beam SDKs handle that for you. Your &lt;code>ProcessElement&lt;/code> method
should accept a parameter &lt;code>element&lt;/code>, which is the input element. In order to output elements,
the method can also take a function parameter, which can be called to emit elements.
The parameter types must match the input and output types of your &lt;code>DoFn&lt;/code>
or the framework will raise an error.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ComputeWordLengthFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Use OutputReceiver.output to emit the output element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ComputeWordLengthFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ComputeWordLengthFn is the DoFn to perform on each element in the input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">ComputeWordLengthFn&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ProcessElement is the method to execute for each element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// DoFns must be registered with beam.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 2 inputs and 0 outputs =&amp;gt; DoFn2x0
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// 1 input =&amp;gt; Emitter1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Input/output types are included in order in the brackets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn2x0&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">)](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">function&lt;/span> &lt;span class="nx">computeWordLengthFn&lt;/span>&lt;span class="p">()&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="err">,&lt;/span> &lt;span class="na">number&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">process&lt;/span>: &lt;span class="kt">function&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Simple DoFns can also be written as functions.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">ComputeWordLengthFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 2 inputs and 0 outputs =&amp;gt; DoFn2x0
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// 1 input =&amp;gt; Emitter1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Input/output types are included in order in the brackets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">DoFn2x0&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">)](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">ComputeWordLengthFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Whether using a structural &lt;code>DoFn&lt;/code> type or a functional &lt;code>DoFn&lt;/code>, they should be registered with
beam in an &lt;code>init&lt;/code> block. Otherwise they may not execute on distributed runners.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;span class="language-java">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> If the elements in your input &lt;code>PCollection&lt;/code> are key/value pairs, you
can access the key or value by using &lt;code>element.getKey()&lt;/code> or
&lt;code>element.getValue()&lt;/code>, respectively.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> If the elements in your input &lt;code>PCollection&lt;/code> are key/value pairs, your
process element method must have two parameters, for each of the key and value,
respectively. Similarly, key/value pairs are also output as separate
parameters to a single &lt;code>emitter function&lt;/code>.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;p>A given &lt;code>DoFn&lt;/code> instance generally gets invoked one or more times to process some
arbitrary bundle of elements. However, Beam doesn&amp;rsquo;t guarantee an exact number of
invocations; it may be invoked multiple times on a given worker node to account
for failures and retries. As such, you can cache information across multiple
calls to your processing method, but if you do so, make sure the implementation
&lt;strong>does not depend on the number of invocations&lt;/strong>.&lt;/p>
&lt;p>In your processing method, you&amp;rsquo;ll also need to meet some immutability
requirements to ensure that Beam and the processing back-end can safely
serialize and cache the values in your pipeline. Your method should meet the
following requirements:&lt;/p>
&lt;span class="language-java">
&lt;ul>
&lt;li>You should not in any way modify an element returned by
the &lt;code>@Element&lt;/code> annotation or &lt;code>ProcessContext.sideInput()&lt;/code> (the incoming
elements from the input collection).&lt;/li>
&lt;li>Once you output a value using &lt;code>OutputReceiver.output()&lt;/code> you should not modify
that value in any way.&lt;/li>
&lt;/ul>
&lt;/span>
&lt;span class="language-py language-typescript">
&lt;ul>
&lt;li>You should not in any way modify the &lt;code>element&lt;/code> argument provided to the
&lt;code>process&lt;/code> method, or any side inputs.&lt;/li>
&lt;li>Once you output a value using &lt;code>yield&lt;/code> or &lt;code>return&lt;/code>, you should not modify
that value in any way.&lt;/li>
&lt;/ul>
&lt;/span>
&lt;span class="language-go">
&lt;ul>
&lt;li>You should not in any way modify the parameters provided to the
&lt;code>ProcessElement&lt;/code> method, or any side inputs.&lt;/li>
&lt;li>Once you output a value using an &lt;code>emitter function&lt;/code>, you should not modify
that value in any way.&lt;/li>
&lt;/ul>
&lt;/span>
&lt;h5 id="lightweight-dofns">4.2.1.3. Lightweight DoFns and other abstractions&lt;/h5>
&lt;p>If your function is relatively straightforward, you can simplify your use of
&lt;code>ParDo&lt;/code> by providing a lightweight &lt;code>DoFn&lt;/code> in-line, as
&lt;span class="language-java">an anonymous inner class instance&lt;/span>
&lt;span class="language-py">a lambda function&lt;/span>
&lt;span class="language-go">an anonymous function&lt;/span>
&lt;span class="language-typescript">a function passed to &lt;code>PCollection.map&lt;/code> or &lt;code>PCollection.flatMap&lt;/code>&lt;/span>.&lt;/p>
&lt;p>Here&amp;rsquo;s the previous example, &lt;code>ParDo&lt;/code> with &lt;code>ComputeLengthWordsFn&lt;/code>, with the
&lt;code>DoFn&lt;/code> specified as
&lt;span class="language-java">an anonymous inner class instance&lt;/span>
&lt;span class="language-py">a lambda function&lt;/span>
&lt;span class="language-go">an anonymous function&lt;/span>
&lt;span class="language-typescript">a function&lt;/span>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Apply a ParDo with an anonymous DoFn to the PCollection words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Save the result as the PCollection wordLengths.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordLengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;ComputeWordLengths&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="c1">// the transform name
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="c1">// a DoFn as an anonymous inner class instance
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The input PCollection of strings.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Apply a lambda function to the PCollection words.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Save the result as the PCollection word_lengths.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_lengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">)])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">The&lt;/span> &lt;span class="nx">Go&lt;/span> &lt;span class="nx">SDK&lt;/span> &lt;span class="nx">cannot&lt;/span> &lt;span class="nx">support&lt;/span> &lt;span class="nx">anonymous&lt;/span> &lt;span class="nx">functions&lt;/span> &lt;span class="nx">outside&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="nx">the&lt;/span> &lt;span class="nx">deprecated&lt;/span> &lt;span class="nx">Go&lt;/span> &lt;span class="nx">Direct&lt;/span> &lt;span class="nx">runner&lt;/span>&lt;span class="p">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// words is the input PCollection of strings
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">var&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lengths&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">},&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection of strings.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">flatMap&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">]);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>If your &lt;code>ParDo&lt;/code> performs a one-to-one mapping of input elements to output
elements&amp;ndash;that is, for each input element, it applies a function that produces
&lt;em>exactly one&lt;/em> output element, &lt;span class="language-go">you can return that
element directly.&lt;/span>&lt;span class="language-java language-py">you can use the higher-level
&lt;span class="language-java">&lt;code>MapElements&lt;/code>&lt;/span>&lt;span class="language-py language-py">&lt;code>Map&lt;/code>&lt;/span>
transform.&lt;/span>&lt;span class="language-java">&lt;code>MapElements&lt;/code> can accept an anonymous
Java 8 lambda function for additional brevity.&lt;/span>&lt;/p>
&lt;p>Here&amp;rsquo;s the previous example using &lt;span class="language-java">&lt;code>MapElements&lt;/code>&lt;/span>
&lt;span class="language-py language-typescript">&lt;code>Map&lt;/code>&lt;/span>&lt;span class="language-go">a direct return&lt;/span>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Apply a MapElements with an anonymous lambda function to the PCollection words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Save the result as the PCollection wordLengths.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordLengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">integers&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">((&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The input PCollection of string.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Apply a Map with a lambda function to the PCollection words.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Save the result as the PCollection word_lengths.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">word_lengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">The&lt;/span> &lt;span class="nx">Go&lt;/span> &lt;span class="nx">SDK&lt;/span> &lt;span class="nx">cannot&lt;/span> &lt;span class="nx">support&lt;/span> &lt;span class="nx">anonymous&lt;/span> &lt;span class="nx">functions&lt;/span> &lt;span class="nx">outside&lt;/span> &lt;span class="nx">of&lt;/span> &lt;span class="nx">the&lt;/span> &lt;span class="nx">deprecated&lt;/span> &lt;span class="nx">Go&lt;/span> &lt;span class="nx">Direct&lt;/span> &lt;span class="nx">runner&lt;/span>&lt;span class="p">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">wordLengths&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function1x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">wordLengths&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">applyWordLenAnon&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">wordLengths&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection of string.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;span class="language-java">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> You can use Java 8 lambda functions with several other Beam
transforms, including &lt;code>Filter&lt;/code>, &lt;code>FlatMapElements&lt;/code>, and &lt;code>Partition&lt;/code>.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Anonymous function DoFns do not work on distributed runners.
It&amp;rsquo;s recommended to use named functions and register them with &lt;code>register.FunctionXxY&lt;/code> in
an &lt;code>init()&lt;/code> block.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;h5 id="dofn">4.2.1.4. DoFn lifecycle&lt;/h5>
&lt;p>Here is a sequence diagram that shows the lifecycle of the DoFn during
the execution of the ParDo transform. The comments give useful
information to pipeline developers such as the constraints that
apply to the objects or particular cases such as failover or
instance reuse. They also give instantiation use cases. Three key points
to note are that:&lt;/p>
&lt;ol>
&lt;li>Teardown is done on a best effort basis and thus
isn&amp;rsquo;t guaranteed.&lt;/li>
&lt;li>The number of DoFn instances created at runtime is runner-dependent.&lt;/li>
&lt;li>For the Python SDK, the pipeline contents such as DoFn user code,
is &lt;a href="https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pickling-and-managing-the-main-session">serialized into a bytecode&lt;/a>. Therefore, &lt;code>DoFn&lt;/code>s should not reference objects that are not serializable, such as locks. To manage a single instance of an object across multiple &lt;code>DoFn&lt;/code> instances in the same process, use utilities in the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.utils.shared.html">shared.py&lt;/a> module.&lt;/li>
&lt;/ol>
&lt;!-- The source for the sequence diagram can be found in the SVG resource. -->
&lt;p>&lt;img src="/images/dofn-sequence-diagram.svg" alt="This is a sequence diagram that shows the lifecycle of the DoFn">&lt;/p>
&lt;h4 id="groupbykey">4.2.2. GroupByKey&lt;/h4>
&lt;p>&lt;code>GroupByKey&lt;/code> is a Beam transform for processing collections of key/value pairs.
It&amp;rsquo;s a parallel reduction operation, analogous to the Shuffle phase of a
Map/Shuffle/Reduce-style algorithm. The input to &lt;code>GroupByKey&lt;/code> is a collection of
key/value pairs that represents a &lt;em>multimap&lt;/em>, where the collection contains
multiple pairs that have the same key, but different values. Given such a
collection, you use &lt;code>GroupByKey&lt;/code> to collect all of the values associated with
each unique key.&lt;/p>
&lt;p>&lt;code>GroupByKey&lt;/code> is a good way to aggregate data that has something in common. For
example, if you have a collection that stores records of customer orders, you
might want to group together all the orders from the same postal code (wherein
the &amp;ldquo;key&amp;rdquo; of the key/value pair is the postal code field, and the &amp;ldquo;value&amp;rdquo; is the
remainder of the record).&lt;/p>
&lt;p>Let&amp;rsquo;s examine the mechanics of &lt;code>GroupByKey&lt;/code> with a simple example case, where
our data set consists of words from a text file and the line number on which
they appear. We want to group together all the line numbers (values) that share
the same word (key), letting us see all the places in the text where a
particular word appears.&lt;/p>
&lt;p>Our input is a &lt;code>PCollection&lt;/code> of key/value pairs where each word is a key, and
the value is a line number in the file where the word appears. Here&amp;rsquo;s a list of
the key/value pairs in the input collection:&lt;/p>
&lt;pre tabindex="0">&lt;code>cat, 1
dog, 5
and, 1
jump, 3
tree, 2
cat, 5
dog, 2
and, 2
cat, 9
and, 6
...
&lt;/code>&lt;/pre>&lt;p>&lt;code>GroupByKey&lt;/code> gathers up all the values with the same key and outputs a new pair
consisting of the unique key and a collection of all of the values that were
associated with that key in the input collection. If we apply &lt;code>GroupByKey&lt;/code> to
our input collection above, the output collection would look like this:&lt;/p>
&lt;pre tabindex="0">&lt;code>cat, [1,5,9]
dog, [5,2]
and, [1,2,6]
jump, [3]
tree, [2]
...
&lt;/code>&lt;/pre>&lt;p>Thus, &lt;code>GroupByKey&lt;/code> represents a transform from a multimap (multiple keys to
individual values) to a uni-map (unique keys to collections of values).&lt;/p>
&lt;p>&lt;span class="language-java">Using &lt;code>GroupByKey&lt;/code> is straightforward:&lt;/span>&lt;/p>
&lt;p class="language-py language-typescript language-yaml">While all SDKs have a &lt;code>GroupByKey&lt;/code> transform, using &lt;code>GroupBy&lt;/code> is
generally more natural.
The &lt;code>GroupBy&lt;/code> transform can be parameterized by the name(s) of properties
on which to group the elements of the PCollection, or a function taking
the each element as input that maps to a key on which to do grouping.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">mapped&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Apply GroupByKey to the PCollection mapped.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Save the result as the PCollection reduced.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">reduced&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">mapped&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">GroupByKey&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The input PCollection of (`string`, `int`) tuples.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">words_and_counts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">grouped_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words_and_counts&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GroupByKey&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CreateAndSplit creates and returns a PCollection with &amp;lt;K,V&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// from an input slice of stringPair (struct with K, V string fields).
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">pairs&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">CreateAndSplit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">keyed&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GroupByKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pairs&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// A PCollection of elements like
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// {word: &amp;#34;cat&amp;#34;, score: 1}, {word: &amp;#34;dog&amp;#34;, score: 5}, {word: &amp;#34;cat&amp;#34;, score: 5}, ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">scores&lt;/span> : &lt;span class="kt">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">word&lt;/span>: &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">score&lt;/span>: &lt;span class="kt">number&lt;/span>&lt;span class="p">}&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This will produce a PCollection with elements like
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// {key: &amp;#34;cat&amp;#34;, value: [{ word: &amp;#34;cat&amp;#34;, score: 1 },
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// { word: &amp;#34;cat&amp;#34;, score: 5 }, ...]}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// {key: &amp;#34;dog&amp;#34;, value: [{ word: &amp;#34;dog&amp;#34;, score: 5 }, ...]}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">grouped_by_word&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">scores&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">groupBy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;word&amp;#34;&lt;/span>&lt;span class="p">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This will produce a PCollection with elements like
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// {key: 3, value: [{ word: &amp;#34;cat&amp;#34;, score: 1 },
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// { word: &amp;#34;dog&amp;#34;, score: 5 },
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// { word: &amp;#34;cat&amp;#34;, score: 5 }, ...]}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">by_word_length&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">scores&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">groupBy&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">x&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Combine&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">group_by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">animal&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">combine&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">weight&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">group&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="groupbykey-and-unbounded-pcollections">4.2.2.1 GroupByKey and unbounded PCollections&lt;/h5>
&lt;p>If you are using unbounded &lt;code>PCollection&lt;/code>s, you must use either &lt;a href="#setting-your-pcollections-windowing-function">non-global
windowing&lt;/a> or an
&lt;a href="#triggers">aggregation trigger&lt;/a> in order to perform a &lt;code>GroupByKey&lt;/code> or
&lt;a href="#cogroupbykey">CoGroupByKey&lt;/a>. This is because a bounded &lt;code>GroupByKey&lt;/code> or
&lt;code>CoGroupByKey&lt;/code> must wait for all the data with a certain key to be collected,
but with unbounded collections, the data is unlimited. Windowing and/or triggers
allow grouping to operate on logical, finite bundles of data within the
unbounded data streams.&lt;/p>
&lt;p>If you do apply &lt;code>GroupByKey&lt;/code> or &lt;code>CoGroupByKey&lt;/code> to a group of unbounded
&lt;code>PCollection&lt;/code>s without setting either a non-global windowing strategy, a trigger
strategy, or both for each collection, Beam generates an IllegalStateException
error at pipeline construction time.&lt;/p>
&lt;p>When using &lt;code>GroupByKey&lt;/code> or &lt;code>CoGroupByKey&lt;/code> to group &lt;code>PCollection&lt;/code>s that have a
&lt;a href="#windowing">windowing strategy&lt;/a> applied, all of the &lt;code>PCollection&lt;/code>s you want to
group &lt;em>must use the same windowing strategy&lt;/em> and window sizing. For example, all
of the collections you are merging must use (hypothetically) identical 5-minute
fixed windows, or 4-minute sliding windows starting every 30 seconds.&lt;/p>
&lt;p>If your pipeline attempts to use &lt;code>GroupByKey&lt;/code> or &lt;code>CoGroupByKey&lt;/code> to merge
&lt;code>PCollection&lt;/code>s with incompatible windows, Beam generates an
IllegalStateException error at pipeline construction time.&lt;/p>
&lt;h4 id="cogroupbykey">4.2.3. CoGroupByKey&lt;/h4>
&lt;p>&lt;code>CoGroupByKey&lt;/code> performs a relational join of two or more key/value
&lt;code>PCollection&lt;/code>s that have the same key type.
&lt;a href="/documentation/pipelines/design-your-pipeline/#multiple-sources">Design Your Pipeline&lt;/a>
shows an example pipeline that uses a join.&lt;/p>
&lt;p>Consider using &lt;code>CoGroupByKey&lt;/code> if you have multiple data sets that provide
information about related things. For example, let&amp;rsquo;s say you have two different
files with user data: one file has names and email addresses; the other file
has names and phone numbers. You can join those two data sets, using the user
name as a common key and the other data as the associated values. After the
join, you have one data set that contains all of the information (email
addresses and phone numbers) associated with each name.&lt;/p>
&lt;p>One can also consider using SqlTransform to perform a join.&lt;/p>
&lt;p>If you are using unbounded &lt;code>PCollection&lt;/code>s, you must use either &lt;a href="#setting-your-pcollections-windowing-function">non-global
windowing&lt;/a> or an
&lt;a href="#triggers">aggregation trigger&lt;/a> in order to perform a &lt;code>CoGroupByKey&lt;/code>. See
&lt;a href="#groupbykey-and-unbounded-pcollections">GroupByKey and unbounded PCollections&lt;/a>
for more details.&lt;/p>
&lt;p class="language-java">In the Beam SDK for Java, &lt;code>CoGroupByKey&lt;/code> accepts a tuple of keyed
&lt;code>PCollection&lt;/code>s (&lt;code>PCollection&amp;lt;KV&amp;lt;K, V&amp;gt;&amp;gt;&lt;/code>) as input. For type safety, the SDK
requires you to pass each &lt;code>PCollection&lt;/code> as part of a &lt;code>KeyedPCollectionTuple&lt;/code>.
You must declare a &lt;code>TupleTag&lt;/code> for each input &lt;code>PCollection&lt;/code> in the
&lt;code>KeyedPCollectionTuple&lt;/code> that you want to pass to &lt;code>CoGroupByKey&lt;/code>. As output,
&lt;code>CoGroupByKey&lt;/code> returns a &lt;code>PCollection&amp;lt;KV&amp;lt;K, CoGbkResult&amp;gt;&amp;gt;&lt;/code>, which groups values
from all the input &lt;code>PCollection&lt;/code>s by their common keys. Each key (all of type
&lt;code>K&lt;/code>) will have a different &lt;code>CoGbkResult&lt;/code>, which is a map from &lt;code>TupleTag&amp;lt;T&amp;gt;&lt;/code> to
&lt;code>Iterable&amp;lt;T&amp;gt;&lt;/code>. You can access a specific collection in an &lt;code>CoGbkResult&lt;/code> object
by using the &lt;code>TupleTag&lt;/code> that you supplied with the initial collection.&lt;/p>
&lt;p class="language-py">In the Beam SDK for Python, &lt;code>CoGroupByKey&lt;/code> accepts a dictionary of keyed
&lt;code>PCollection&lt;/code>s as input. As output, &lt;code>CoGroupByKey&lt;/code> creates a single output
&lt;code>PCollection&lt;/code> that contains one key/value tuple for each key in the input
&lt;code>PCollection&lt;/code>s. Each key&amp;rsquo;s value is a dictionary that maps each tag to an
iterable of the values under they key in the corresponding &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p class="language-go">In the Beam Go SDK, &lt;code>CoGroupByKey&lt;/code> accepts an arbitrary number of
&lt;code>PCollection&lt;/code>s as input. As output, &lt;code>CoGroupByKey&lt;/code> creates a single output
&lt;code>PCollection&lt;/code> that groups each key with value iterator functions for each
input &lt;code>PCollection&lt;/code>. The iterator functions map to input &lt;code>PCollections&lt;/code> in
the same order they were provided to the &lt;code>CoGroupByKey&lt;/code>.&lt;/p>
&lt;p>The following conceptual examples use two input collections to show the mechanics of
&lt;code>CoGroupByKey&lt;/code>.&lt;/p>
&lt;p class="language-java">The first set of data has a &lt;code>TupleTag&amp;lt;String&amp;gt;&lt;/code> called &lt;code>emailsTag&lt;/code> and contains names
and email addresses. The second set of data has a &lt;code>TupleTag&amp;lt;String&amp;gt;&lt;/code> called
&lt;code>phonesTag&lt;/code> and contains names and phone numbers.&lt;/p>
&lt;p class="language-py language-go">The first set of data contains names and email addresses. The second set of
data contains names and phone numbers.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">emailsList&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">phonesList&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">emails&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;CreateEmails&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsList&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">phones&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;CreatePhones&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesList&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">emails_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;amy&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;amy@example.com&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;carl&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;carl@example.com&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;julia&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;julia@example.com&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;carl&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;carl@email.com&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">phones_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;amy&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;111-222-3333&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;james&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;222-333-4444&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;amy&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;333-444-5555&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;carl&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;444-555-6666&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">emails&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;CreateEmails&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">emails_list&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">phones&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;CreatePhones&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">phones_list&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">stringPair&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">K&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">V&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">splitStringPair&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">e&lt;/span> &lt;span class="nx">stringPair&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">K&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">V&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Register DoFn.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function1x2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">splitStringPair&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CreateAndSplit is a helper function that creates
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">CreateAndSplit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">stringPair&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">initial&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CreateList&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">splitStringPair&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">initial&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">emailSlice&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">stringPair&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">phoneSlice&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">stringPair&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">emails&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">CreateAndSplit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scope&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;CreateEmails&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">emailSlice&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">phones&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">CreateAndSplit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scope&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;CreatePhones&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">phoneSlice&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">emails_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy@example.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl@example.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;julia@example.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl@email.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">phones_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;111-222-3333&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;222-333-4444&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;333-444-5555&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;444-555-6666&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">emails&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withName&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;createEmails&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">emails_list&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">phones&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withName&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;createPhones&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">phones_list&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Create&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CreateEmails&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">elements&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="nt">, email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="nt">, email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="nt">, email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="nt">, email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Create&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CreatePhones&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">elements&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="nt">, phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="nt">, phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="nt">, phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>- {&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="nt">, phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>After &lt;code>CoGroupByKey&lt;/code>, the resulting data contains all data associated with each
unique key from any of the input collections.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">emailsTag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">phonesTag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">expectedResults&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="o">))),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="o">))),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="o">))),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">())));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;amy&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;amy@example.com&amp;#39;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;111-222-3333&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;333-444-5555&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;carl&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;carl@email.com&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;carl@example.com&amp;#39;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;444-555-6666&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;james&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[],&lt;/span> &lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;222-333-4444&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;julia&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;julia@example.com&amp;#39;&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">results&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CoGroupByKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phones&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">contactLines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">formatCoGBKResults&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">results&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Synthetic example results of a cogbk.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">results&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Phones&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">values&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">emails&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy@example.com&amp;#34;&lt;/span> &lt;span class="p">}],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">phones&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;111-222-3333&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;333-444-5555&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">values&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">emails&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl@example.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl@email.com&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">phones&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;444-555-6666&amp;#34;&lt;/span> &lt;span class="p">}],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">values&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">emails&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">phones&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phone&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;222-333-4444&amp;#34;&lt;/span> &lt;span class="p">}],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">values&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">emails&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[{&lt;/span> &lt;span class="nx">name&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">email&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;julia@example.com&amp;#34;&lt;/span> &lt;span class="p">}],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">phones&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">];&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py language-typescript">The following code example joins the two &lt;code>PCollection&lt;/code>s with &lt;code>CoGroupByKey&lt;/code>,
followed by a &lt;code>ParDo&lt;/code> to consume the result. Then, the code uses tags to look up
and format data from each collection.&lt;/p>
&lt;p class="language-go">The following code example joins the two &lt;code>PCollection&lt;/code>s with &lt;code>CoGroupByKey&lt;/code>,
followed by a &lt;code>ParDo&lt;/code> to consume the result. The ordering of the &lt;code>DoFn&lt;/code> iterator
parameters maps to the ordering of the &lt;code>CoGroupByKey&lt;/code> inputs.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KeyedPCollectionTuple&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">emails&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">phones&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CoGroupByKey&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">contactLines&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">results&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">e&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getKey&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">emailsIter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getAll&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">emailsTag&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">phonesIter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getAll&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">phonesTag&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">formattedResult&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Snippets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">formatCoGbkResults&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">emailsIter&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">phonesIter&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">formattedResult&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The result PCollection contains one key-value element for each key in the&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># input PCollections. The key of the pair will be the key from the input and&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># the value will be a dictionary with two entries: &amp;#39;emails&amp;#39; - an iterable of&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># all values for the current key in the emails PCollection and &amp;#39;phones&amp;#39;: an&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># iterable of all values for the current key in the phones PCollection.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">({&lt;/span>&lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">phones&lt;/span>&lt;span class="p">}&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CoGroupByKey&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">join_info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">name_info&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">info&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">name_info&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="s1">&amp;#39;&lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">; &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">; &lt;/span>&lt;span class="si">%s&lt;/span>&lt;span class="s1">&amp;#39;&lt;/span> &lt;span class="o">%&lt;/span>\
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">sorted&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;emails&amp;#39;&lt;/span>&lt;span class="p">]),&lt;/span> &lt;span class="nb">sorted&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;phones&amp;#39;&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">contact_lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">join_info&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">formatCoGBKResults&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emailIter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phoneIter&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">var&lt;/span> &lt;span class="nx">s&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">var&lt;/span> &lt;span class="nx">emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phones&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nf">emailIter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">emails&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nf">phoneIter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">phones&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">phones&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Values have no guaranteed order, sort for deterministic output.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">sort&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Strings&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">emails&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">sort&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Strings&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">phones&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Sprintf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;%s; %s; %s&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nf">formatStringIter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">emails&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nf">formatStringIter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">phones&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function3x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">formatCoGBKResults&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 1 input of type string =&amp;gt; Iter1[string]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Iter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Synthetic example results of a cogbk.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">results&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Phones&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;amy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;amy@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;111-222-3333&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;333-444-5555&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;carl&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;carl@email.com&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;carl@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;444-555-6666&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;james&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;222-333-4444&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Key&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;julia&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Emails&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;julia@example.com&amp;#34;&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Phones&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">formatted_results_pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">P&lt;/span>&lt;span class="p">({&lt;/span> &lt;span class="nx">emails&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">phones&lt;/span> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">coGroupBy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">function&lt;/span> &lt;span class="nx">formatResults&lt;/span>&lt;span class="p">({&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">values&lt;/span> &lt;span class="p">})&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kr">const&lt;/span> &lt;span class="nx">emails&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">values&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">emails&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">x&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">email&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">sort&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kr">const&lt;/span> &lt;span class="nx">phones&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">values&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">phones&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">x&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">x&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">phone&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">sort&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="sb">`&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nx">key&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="sb">; [&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nx">emails&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="sb">]; [&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nx">phones&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="sb">]`&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">PrepareEmails&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CreateEmails&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;[email]&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;[]&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">PreparePhones&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CreatePhones&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">name&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;[]&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;[phone]&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Combine&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CoGropuBy&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">PrepareEmails, PreparePhones]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">group_by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">name]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">combine&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">email&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">concat&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">phone&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">concat&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">FormatResults&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">CoGropuBy&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">formatted&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;&amp;#39;%s; %s; %s&amp;#39; % (name, sorted(email), sorted(phone))&amp;#34;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The formatted data looks like this:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">formattedResults&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;amy; [&amp;#39;amy@example.com&amp;#39;]; [&amp;#39;111-222-3333&amp;#39;, &amp;#39;333-444-5555&amp;#39;]&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;carl; [&amp;#39;carl@email.com&amp;#39;, &amp;#39;carl@example.com&amp;#39;]; [&amp;#39;444-555-6666&amp;#39;]&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;james; []; [&amp;#39;222-333-4444&amp;#39;]&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;julia; [&amp;#39;julia@example.com&amp;#39;]; []&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">formatted_results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;amy; [&amp;#39;amy@example.com&amp;#39;]; [&amp;#39;111-222-3333&amp;#39;, &amp;#39;333-444-5555&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;carl; [&amp;#39;carl@email.com&amp;#39;, &amp;#39;carl@example.com&amp;#39;]; [&amp;#39;444-555-6666&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;james; []; [&amp;#39;222-333-4444&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;julia; [&amp;#39;julia@example.com&amp;#39;]; []&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">formattedResults&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;amy; [&amp;#39;amy@example.com&amp;#39;]; [&amp;#39;111-222-3333&amp;#39;, &amp;#39;333-444-5555&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;carl; [&amp;#39;carl@email.com&amp;#39;, &amp;#39;carl@example.com&amp;#39;]; [&amp;#39;444-555-6666&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;james; []; [&amp;#39;222-333-4444&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;julia; [&amp;#39;julia@example.com&amp;#39;]; []&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">formatted_results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;amy; [amy@example.com]; [111-222-3333,333-444-5555]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;carl; [carl@email.com,carl@example.com]; [444-555-6666]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;james; []; [222-333-4444]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;julia; [julia@example.com]; []&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">];&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="s2">&amp;#34;amy; [&amp;#39;amy@example.com&amp;#39;]; [&amp;#39;111-222-3333&amp;#39;, &amp;#39;333-444-5555&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="s2">&amp;#34;carl; [&amp;#39;carl@email.com&amp;#39;, &amp;#39;carl@example.com&amp;#39;]; [&amp;#39;444-555-6666&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="s2">&amp;#34;james; []; [&amp;#39;222-333-4444&amp;#39;]&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="s2">&amp;#34;julia; [&amp;#39;julia@example.com&amp;#39;]; []&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="combine">4.2.4. Combine&lt;/h4>
&lt;p>&lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/transforms/Combine.html">&lt;code>Combine&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-py">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py">&lt;code>Combine&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-go">&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/combine.go#L27">&lt;code>Combine&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-typescript">&lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/group_and_combine.ts">&lt;code>Combine&lt;/code>&lt;/a>&lt;/span>
is a Beam transform for combining collections of elements or values in your
data. &lt;code>Combine&lt;/code> has variants that work on entire &lt;code>PCollection&lt;/code>s, and some that
combine the values for each key in &lt;code>PCollection&lt;/code>s of key/value pairs.&lt;/p>
&lt;p>When you apply a &lt;code>Combine&lt;/code> transform, you must provide the function that
contains the logic for combining the elements or values. The combining function
should be commutative and associative, as the function is not necessarily
invoked exactly once on all values with a given key. Because the input data
(including the value collection) may be distributed across multiple workers, the
combining function might be called multiple times to perform partial combining
on subsets of the value collection. The Beam SDK also provides some pre-built
combine functions for common numeric combination operations such as sum, min,
and max.&lt;/p>
&lt;p>Simple combine operations, such as sums, can usually be implemented as a simple
function. More complex combination operations might require you to create a
&lt;span class="language-java language-py">subclass of&lt;/span> &lt;code>CombineFn&lt;/code>
that has an accumulation type distinct from the input/output type.&lt;/p>
&lt;p>The associativity and commutativity of a &lt;code>CombineFn&lt;/code> allows runners to
automatically apply some optimizations:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Combiner lifting&lt;/strong>: This is the most significant optimization. Input
elements are combined per key and window before they are shuffled, so the
volume of data shuffled might be reduced by many orders of magnitude. Another
term for this optimization is &amp;ldquo;mapper-side combine.&amp;rdquo;&lt;/li>
&lt;li>&lt;strong>Incremental combining&lt;/strong>: When you have a &lt;code>CombineFn&lt;/code> that reduces the data
size by a lot, it is useful to combine elements as they emerge from a
streaming shuffle. This spreads out the cost of doing combines over the time
that your streaming computation might be idle. Incremental combining also
reduces the storage of intermediate accumulators.&lt;/li>
&lt;/ul>
&lt;h5 id="simple-combines">4.2.4.1. Simple combinations using simple functions&lt;/h5>
&lt;span class="language-yaml">
Beam YAML has the following buit-in CombineFns: count, sum, min, max,
mean, any, all, group, and concat.
CombineFns from other languages can also be referenced
as described in the (full docs on aggregation)[https://beam.apache.org/documentation/sdks/yaml-combine/].
&lt;/span>
The following example code shows a simple combine function.
&lt;span class="language-typescript">
Combining is done by modifying a grouping transform with the `combining` method.
This method takes three parameters: the value to combine (either as a named
property of the input elements, or a function of the entire input),
the combining operation (either a binary function or a `CombineFn`),
and finally a name for the combined value in the output object.
&lt;/span>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Sum a collection of Integer values. The function SumInts implements the interface SerializableFunction.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">SumInts&lt;/span> &lt;span class="kd">implements&lt;/span> &lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Integer&lt;/span> &lt;span class="nf">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="n">item&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sum&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">sum&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1000&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">bounded_sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">values&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">500&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">min&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">values&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">bound&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">small_sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bounded_sum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># [500]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">large_sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bounded_sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5000&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># [1111]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">sumInts&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">v&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function2x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sumInts&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">globallySumInts&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ints&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Combine&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sumInts&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ints&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">boundedSum&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Bound&lt;/span> &lt;span class="kt">int&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">boundedSum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">MergeAccumulators&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">sum&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">v&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bound&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="mi">0&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nx">sum&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bound&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bound&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">sum&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combiner1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">boundedSum&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">globallyBoundedSumInts&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">bound&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ints&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Combine&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">boundedSum&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Bound&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">bound&lt;/span>&lt;span class="p">},&lt;/span> &lt;span class="nx">ints&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1000&lt;/span>&lt;span class="p">]));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">groupGlobally&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">c&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">y&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">x&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;sum&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">c&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">y&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">x&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;product&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">expected&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">sum&lt;/span>: &lt;span class="kt">1111&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">product&lt;/span>: &lt;span class="kt">1000000&lt;/span> &lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Combine&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">group_by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">animal&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">combine&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">biggest&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fn&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;apache_beam.transforms.combiners.TopCombineFn&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">n&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">2&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">weight&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">All Combiners should be registered using a generic &lt;code>register.CombinerX[...]&lt;/code>
function. This allows the Go SDK to infer an encoding from any inputs/outputs,
registers the Combiner for execution on remote runners, and optimizes the runtime
execution of the Combiner via reflection.&lt;/p>
&lt;p class="language-go">Combiner1 should be used when your accumulator, input, and output are all of the
same type. It can be called with &lt;code>register.Combiner1[T](&amp;amp;CustomCombiner{})&lt;/code> where &lt;code>T&lt;/code>
is the type of the input/accumulator/output.&lt;/p>
&lt;p class="language-go">Combiner2 should be used when your accumulator, input, and output are 2 distinct
types. It can be called with &lt;code>register.Combiner2[T1, T2](&amp;amp;CustomCombiner{})&lt;/code> where
&lt;code>T1&lt;/code> is the type of the accumulator and &lt;code>T2&lt;/code> is the other type.&lt;/p>
&lt;p class="language-go">Combiner3 should be used when your accumulator, input, and output are 3 distinct
types. It can be called with &lt;code>register.Combiner3[T1, T2, T3](&amp;amp;CustomCombiner{})&lt;/code>
where &lt;code>T1&lt;/code> is the type of the accumulator, &lt;code>T2&lt;/code> is the type of the input, and &lt;code>T3&lt;/code> is
the type of the output.&lt;/p>
&lt;h5 id="advanced-combines">4.2.4.2. Advanced combinations using CombineFn&lt;/h5>
&lt;p>For more complex combine functions, you can define a
&lt;span class="language-java language-py">subclass of&lt;/span>&lt;code>CombineFn&lt;/code>.
You should use a &lt;code>CombineFn&lt;/code> if the combine function requires a more sophisticated
accumulator, must perform additional pre- or post-processing, might change the
output type, or takes the key into account.&lt;/p>
&lt;p>A general combining operation consists of five operations. When you create a
&lt;span class="language-java language-py">subclass of&lt;/span>
&lt;code>CombineFn&lt;/code>, you must provide five operations by overriding the
corresponding methods. Only &lt;code>MergeAccumulators&lt;/code> is a required method. The
others will have a default interpretation based on the accumulator type. The
lifecycle methods are:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Create Accumulator&lt;/strong> creates a new &amp;ldquo;local&amp;rdquo; accumulator. In the example
case, taking a mean average, a local accumulator tracks the running sum of
values (the numerator value for our final average division) and the number of
values summed so far (the denominator value). It may be called any number of
times in a distributed fashion.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Add Input&lt;/strong> adds an input element to an accumulator, returning the
accumulator value. In our example, it would update the sum and increment the
count. It may also be invoked in parallel.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Merge Accumulators&lt;/strong> merges several accumulators into a single accumulator;
this is how data in multiple accumulators is combined before the final
calculation. In the case of the mean average computation, the accumulators
representing each portion of the division are merged together. It may be
called again on its outputs any number of times.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Extract Output&lt;/strong> performs the final computation. In the case of computing a
mean average, this means dividing the combined sum of all the values by the
number of values summed. It is called once on the final, merged accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Compact&lt;/strong> returns a more compact represenation of the accumulator. This is
called before an accumulator is sent across the wire, and can be useful in
cases where values are buffered or otherwise lazily kept unprocessed when
added to the accumulator. Compact should return an equivalent, though
possibly modified, accumulator. In most cases, Compact is not necessary. For
a real world example of using Compact, see the Python SDK implementation of
&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/combiners.py#L523">TopCombineFn&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>The following example code shows how to define a &lt;code>CombineFn&lt;/code> that computes a
mean average:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">AverageFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">CombineFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">AverageFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Accum&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">Accum&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Accum&lt;/span> &lt;span class="nf">createAccumulator&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Accum&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Accum&lt;/span> &lt;span class="nf">addInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Accum&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sum&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">count&lt;/span>&lt;span class="o">++;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Accum&lt;/span> &lt;span class="nf">mergeAccumulators&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Accum&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">accums&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Accum&lt;/span> &lt;span class="n">merged&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">createAccumulator&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">Accum&lt;/span> &lt;span class="n">accum&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">accums&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">merged&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sum&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sum&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">merged&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">count&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">count&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">merged&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Double&lt;/span> &lt;span class="nf">extractOutput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Accum&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="o">((&lt;/span>&lt;span class="kt">double&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sum&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">count&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// No-op
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Accum&lt;/span> &lt;span class="nf">compact&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Accum&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">accum&lt;/span>&lt;span class="o">;&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">AverageFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">create_accumulator&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mf">0.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">add_input&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sum_count&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">input&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sum_count&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nb">input&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">merge_accumulators&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">accumulators&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sums&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">counts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">zip&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="n">accumulators&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sums&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">counts&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">extract_output&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sum_count&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">sum_count&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;NaN&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">compact&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">accumulator&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># No-op&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">accumulator&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">averageFn&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">averageAccum&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Count&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Sum&lt;/span> &lt;span class="kt">int&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">CreateAccumulator&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="nx">averageAccum&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">AddInput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">averageAccum&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Count&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Count&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Sum&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Sum&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">MergeAccumulators&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">averageAccum&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Count&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Count&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Count&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Sum&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Sum&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Sum&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ExtractOutput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">float64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Count&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">math&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NaN&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Sum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Count&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Compact&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span> &lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">averageAccum&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// No-op
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combiner3&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">averageAccum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">float64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">{})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">meanCombineFn&lt;/span>: &lt;span class="kt">beam.CombineFn&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">number&lt;/span>&lt;span class="err">,&lt;/span> &lt;span class="err">[&lt;/span>&lt;span class="na">number&lt;/span>&lt;span class="err">,&lt;/span> &lt;span class="na">number&lt;/span>&lt;span class="err">],&lt;/span> &lt;span class="na">number&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">createAccumulator&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">()&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">addInput&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">([&lt;/span>&lt;span class="nx">sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="kt">number&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">number&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="nx">i&lt;/span>: &lt;span class="kt">number&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">sum&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">i&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">count&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">mergeAccumulators&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">accumulators&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="kt">number&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">number&lt;/span>&lt;span class="p">][])&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">accumulators&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">reduce&lt;/span>&lt;span class="p">(([&lt;/span>&lt;span class="nx">sum0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="nx">sum1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count1&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">sum0&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">sum1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">count0&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">count1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">extractOutput&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">([&lt;/span>&lt;span class="nx">sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="kt">number&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">number&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">sum&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nx">count&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">};&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;span class="language-go">
&lt;/span>
&lt;h5 id="combining-pcollection">4.2.4.3. Combining a PCollection into a single value&lt;/h5>
&lt;p>Use the global combine to transform all of the elements in a given &lt;code>PCollection&lt;/code>
into a single value, represented in your pipeline as a new &lt;code>PCollection&lt;/code>
containing one element. The following example code shows how to apply the Beam
provided sum combine function to produce a single sum value for a &lt;code>PCollection&lt;/code>
of integers.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Sum.SumIntegerFn() combines the elements in the input PCollection. The resulting PCollection, called sum,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// contains one value: the sum of all the elements in the input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Combine&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">globally&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SumIntegerFn&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># sum combines the elements in the input PCollection.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The resulting PCollection, called result, contains one value: the sum of all&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># the elements in the input PCollection.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">average&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">AverageFn&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">average&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Combine&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">{},&lt;/span> &lt;span class="nx">ints&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">6&lt;/span>&lt;span class="p">]));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">groupGlobally&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">c&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">meanCombineFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;mean&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Combine&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">group_by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">combine&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">weight&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">sum&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="combine-global-windowing">4.2.4.4. Combine and global windowing&lt;/h5>
&lt;p>If your input &lt;code>PCollection&lt;/code> uses the default global windowing, the default
behavior is to return a &lt;code>PCollection&lt;/code> containing one item. That item&amp;rsquo;s value
comes from the accumulator in the combine function that you specified when
applying &lt;code>Combine&lt;/code>. For example, the Beam provided sum combine function returns
a zero value (the sum of an empty input), while the min combine function returns
a maximal or infinite value.&lt;/p>
&lt;p>To have &lt;code>Combine&lt;/code> instead return an empty &lt;code>PCollection&lt;/code> if the input is empty,
specify &lt;code>.withoutDefaults&lt;/code> when you apply your &lt;code>Combine&lt;/code> transform, as in the
following code example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Combine&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">globally&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SumIntegerFn&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">withoutDefaults&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">without_defaults&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">returnSideOrDefault&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">d&lt;/span> &lt;span class="kt">float64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">iter&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="kt">float64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">float64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">var&lt;/span> &lt;span class="nx">c&lt;/span> &lt;span class="kt">float64&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nf">iter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">c&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Side input has a value, so return it.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">c&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Otherwise, return the default
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">d&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function2x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">returnSideOrDefault&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">globallyAverageWithDefault&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ints&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Setting combine defaults has requires no helper function in the Go SDK.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">average&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Combine&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">averageFn&lt;/span>&lt;span class="p">{},&lt;/span> &lt;span class="nx">ints&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// To add a default value:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">defaultValue&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">returnSideOrDefault&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">defaultValue&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SideInput&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Input&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">average&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;alice&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">1.0&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;bob&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.99&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;eve&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.5&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;eve&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.25&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">groupGlobally&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;accuracy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiners&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">mean&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;mean&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;accuracy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiners&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">max&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;max&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">expected&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[{&lt;/span> &lt;span class="nx">max&lt;/span>: &lt;span class="kt">1.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">mean&lt;/span>: &lt;span class="kt">0.685&lt;/span> &lt;span class="p">}];&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="combine-non-global-windowing">4.2.4.5. Combine and non-global windowing&lt;/h5>
&lt;span class="language-java language-py">
&lt;p>If your &lt;code>PCollection&lt;/code> uses any non-global windowing function, Beam does not
provide the default behavior. You must specify one of the following options when
applying &lt;code>Combine&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>Specify &lt;code>.withoutDefaults&lt;/code>, where windows that are empty in the input
&lt;code>PCollection&lt;/code> will likewise be empty in the output collection.&lt;/li>
&lt;li>Specify &lt;code>.asSingletonView&lt;/code>, in which the output is immediately converted to a
&lt;code>PCollectionView&lt;/code>, which will provide a default value for each empty window
when used as a side input. You&amp;rsquo;ll generally only need to use this option if
the result of your pipeline&amp;rsquo;s &lt;code>Combine&lt;/code> is to be used as a side input later in
the pipeline.&lt;/li>
&lt;/ul>
&lt;/span>
&lt;p class="language-go">If your &lt;code>PCollection&lt;/code> uses any non-global windowing function, the Beam Go SDK
behaves the same way as with global windowing. Windows that are empty in the input
&lt;code>PCollection&lt;/code> will likewise be empty in the output collection.&lt;/p>
&lt;h5 id="combining-values-in-a-keyed-pcollection">4.2.4.6. Combining values in a keyed PCollection&lt;/h5>
&lt;p>After creating a keyed PCollection (for example, by using a &lt;code>GroupByKey&lt;/code>
transform), a common pattern is to combine the collection of values associated
with each key into a single, merged value. Drawing on the previous example from
&lt;code>GroupByKey&lt;/code>, a key-grouped &lt;code>PCollection&lt;/code> called &lt;code>groupedWords&lt;/code> looks like this:&lt;/p>
&lt;pre tabindex="0">&lt;code> cat, [1,5,9]
dog, [5,2]
and, [1,2,6]
jump, [3]
tree, [2]
...
&lt;/code>&lt;/pre>&lt;p>In the above &lt;code>PCollection&lt;/code>, each element has a string key (for example, &amp;ldquo;cat&amp;rdquo;)
and an iterable of integers for its value (in the first element, containing [1,
5, 9]). If our pipeline&amp;rsquo;s next processing step combines the values (rather than
considering them individually), you can combine the iterable of integers to
create a single, merged value to be paired with each key. This pattern of a
&lt;code>GroupByKey&lt;/code> followed by merging the collection of values is equivalent to
Beam&amp;rsquo;s Combine PerKey transform. The combine function you supply to Combine
PerKey must be an associative reduction function or a
&lt;span class="language-java language-py">subclass of&lt;/span> &lt;code>CombineFn&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// PCollection is grouped by key and the Double values associated with each key are combined into a Double.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">salesRecords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">totalSalesPerPerson&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">salesRecords&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perKey&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SumDoubleFn&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The combined value is of a different type than the original collection of values per key. PCollection has
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// keys of type String and values of type Integer, and the combined value is a Double.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">playerAccuracy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">avgAccuracyPerPlayer&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">playerAccuracy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perKey&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">MeanInts&lt;/span>&lt;span class="o">())));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># PCollection is grouped by key and the numeric values associated with each key&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># are averaged into a float.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">player_accuracies&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">avg_accuracy_per_player&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">player_accuracies&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MeanCombineFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// PCollection is grouped by key and the numeric values associated with each key
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// are averaged into a float64.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">playerAccuracies&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1">// PCollection&amp;lt;string,int&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">avgAccuracyPerPlayer&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MeanPerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">playerAccuracies&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// avgAccuracyPerPlayer is a PCollection&amp;lt;string,float64&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;alice&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">1.0&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;bob&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.99&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;eve&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.5&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;eve&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">accuracy&lt;/span>: &lt;span class="kt">0.25&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">groupBy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;player&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;accuracy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiners&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">mean&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;mean&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;accuracy&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiners&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">max&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;max&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">expected&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;alice&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">mean&lt;/span>: &lt;span class="kt">1.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">max&lt;/span>: &lt;span class="kt">1.0&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;bob&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">mean&lt;/span>: &lt;span class="kt">0.99&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">max&lt;/span>: &lt;span class="kt">0.99&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">player&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;eve&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">mean&lt;/span>: &lt;span class="kt">0.375&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">max&lt;/span>: &lt;span class="kt">0.5&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">];&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Combine&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">group_by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">animal]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">combine&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">total_weight&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fn&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">sum&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">weight&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">average_weight&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fn&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">mean&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">value&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">weight&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="flatten">4.2.5. Flatten&lt;/h4>
&lt;p>&lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/transforms/Flatten.html">&lt;code>Flatten&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-py">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py">&lt;code>Flatten&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-go">&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/flatten.go">&lt;code>Flatten&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-typescript">&lt;code>Flatten&lt;/code>&lt;/span>
is a Beam transform for &lt;code>PCollection&lt;/code> objects that store the same data type.
&lt;code>Flatten&lt;/code> merges multiple &lt;code>PCollection&lt;/code> objects into a single logical
&lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>The following example shows how to apply a &lt;code>Flatten&lt;/code> transform to merge multiple
&lt;code>PCollection&lt;/code> objects.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Flatten takes a PCollectionList of PCollection objects of a given type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Returns a single PCollection that contains all of the elements in the PCollection objects in that list.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollectionList&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">collections&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PCollectionList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pc1&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pc2&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pc3&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">merged&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">collections&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Flatten&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">pCollections&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Flatten takes a tuple of PCollection objects.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Returns a single PCollection that contains all of the elements in the PCollection objects in that tuple.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">merged&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="n">pcoll1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll3&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># A list of tuples can be &amp;#34;piped&amp;#34; directly into a Flatten transform.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Flatten&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Flatten accepts any number of PCollections of the same element type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Returns a single PCollection that contains all of the elements in input PCollections.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">merged&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Flatten&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pcol1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pcol2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pcol3&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Flatten taken an array of PCollection objects, wrapped in beam.P(...)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Returns a single PCollection that contains a union of all of the elements in all input PCollections.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">fib&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withName&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;createFib&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pow&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withName&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;createPow&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">32&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">P&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="nx">fib&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">pow&lt;/span>&lt;span class="p">]).&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">flatten&lt;/span>&lt;span class="p">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Flatten&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">SomeProducingTransform, AnotherProducingTransform]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-yaml">In Beam YAML explicit flattens are not usually needed as one can list
multiple inputs for any transform which will be implicitly flattened.&lt;/p>
&lt;h5 id="data-encoding-merged-collections">4.2.5.1. Data encoding in merged collections&lt;/h5>
&lt;p>By default, the coder for the output &lt;code>PCollection&lt;/code> is the same as the coder for
the first &lt;code>PCollection&lt;/code> in the input &lt;code>PCollectionList&lt;/code>. However, the input
&lt;code>PCollection&lt;/code> objects can each use different coders, as long as they all contain
the same data type in your chosen language.&lt;/p>
&lt;h5 id="merging-windowed-collections">4.2.5.2. Merging windowed collections&lt;/h5>
&lt;p>When using &lt;code>Flatten&lt;/code> to merge &lt;code>PCollection&lt;/code> objects that have a windowing
strategy applied, all of the &lt;code>PCollection&lt;/code> objects you want to merge must use a
compatible windowing strategy and window sizing. For example, all the
collections you&amp;rsquo;re merging must all use (hypothetically) identical 5-minute
fixed windows or 4-minute sliding windows starting every 30 seconds.&lt;/p>
&lt;p>If your pipeline attempts to use &lt;code>Flatten&lt;/code> to merge &lt;code>PCollection&lt;/code> objects with
incompatible windows, Beam generates an &lt;code>IllegalStateException&lt;/code> error when your
pipeline is constructed.&lt;/p>
&lt;h4 id="partition">4.2.6. Partition&lt;/h4>
&lt;p>&lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/transforms/Partition.html">&lt;code>Partition&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-py">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py">&lt;code>Partition&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-go">&lt;a href="https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/partition.go">&lt;code>Partition&lt;/code>&lt;/a>&lt;/span>
&lt;span class="language-typescript">&lt;code>Partition&lt;/code>&lt;/span>
is a Beam transform for &lt;code>PCollection&lt;/code> objects that store the same data
type. &lt;code>Partition&lt;/code> splits a single &lt;code>PCollection&lt;/code> into a fixed number of smaller
collections.&lt;/p>
&lt;p class="language-typescript">Often in the Typescript SDK the &lt;code>Split&lt;/code> transform is more natural to use.&lt;/p>
&lt;p>&lt;code>Partition&lt;/code> divides the elements of a &lt;code>PCollection&lt;/code> according to a partitioning
function that you provide. The partitioning function contains the logic that
determines how to split up the elements of the input &lt;code>PCollection&lt;/code> into each
resulting partition &lt;code>PCollection&lt;/code>. The number of partitions must be determined
at graph construction time. You can, for example, pass the number of partitions
as a command-line option at runtime (which will then be used to build your
pipeline graph), but you cannot determine the number of partitions in
mid-pipeline (based on data calculated after your pipeline graph is constructed,
for instance).&lt;/p>
&lt;p>The following example divides a &lt;code>PCollection&lt;/code> into percentile groups.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Provide an int value with the desired number of result partitions, and a PartitionFn that represents the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// partitioning function. In this example, we define the PartitionFn in-line. Returns a PCollectionList
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// containing each of the resulting partitions as individual PCollection objects.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Student&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">students&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Split students up into 10 partitions, by percentile:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollectionList&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Student&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">studentsByPercentile&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">students&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Partition&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">PartitionFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Student&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="nf">partitionFor&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Student&lt;/span> &lt;span class="n">student&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">numPartitions&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">student&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getPercentile&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="c1">// 0..99
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">numPartitions&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">100&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}}));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// You can extract each partition from the PCollectionList using the get method, as follows:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Student&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">fortiethPercentile&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">studentsByPercentile&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">4&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Provide an int value with the desired number of result partitions, and a partitioning function (partition_fn in this example).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Returns a tuple of PCollection objects containing each of the resulting partitions as individual PCollection objects.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">students&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">partition_fn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">student&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">num_partitions&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">get_percentile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">student&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">num_partitions&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">by_decile&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">students&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Partition&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">partition_fn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># You can extract each partition from the tuple of PCollection objects as follows:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">fortieth_percentile&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">by_decile&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">decileFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">student&lt;/span> &lt;span class="nx">Student&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">student&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Percentile&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function1x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">decileFn&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Partition returns a slice of PCollections
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">studentsByPercentile&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Partition&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">decileFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">students&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Each partition can be extracted by indexing into the slice.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">fortiethPercentile&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">studentsByPercentile&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">deciles&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">Student&lt;/span>&lt;span class="p">&amp;gt;[]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">students&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">partition&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="nx">student&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">numPartitions&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">Math&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">floor&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">getPercentile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">student&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mi">100&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">numPartitions&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">10&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">topDecile&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">Student&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">deciles&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">9&lt;/span>&lt;span class="p">];&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Partition&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">by&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">str(percentile // 10)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">outputs&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;0&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;2&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;3&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;4&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;5&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;6&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;7&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;8&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;9&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;10&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-yaml">Note that in Beam YAML, &lt;code>PCollections&lt;/code> are partitioned via string rather than integer values.&lt;/p>
&lt;h3 id="requirements-for-writing-user-code-for-beam-transforms">4.3. Requirements for writing user code for Beam transforms&lt;/h3>
&lt;p>When you build user code for a Beam transform, you should keep in mind the
distributed nature of execution. For example, there might be many copies of your
function running on a lot of different machines in parallel, and those copies
function independently, without communicating or sharing state with any of the
other copies. Depending on the Pipeline Runner and processing back-end you
choose for your pipeline, each copy of your user code function may be retried or
run multiple times. As such, you should be cautious about including things like
state dependency in your user code.&lt;/p>
&lt;p>In general, your user code must fulfill at least these requirements:&lt;/p>
&lt;ul>
&lt;li>Your function object must be &lt;strong>serializable&lt;/strong>.&lt;/li>
&lt;li>Your function object must be &lt;strong>thread-compatible&lt;/strong>, and be aware that &lt;em>the
Beam SDKs are not thread-safe&lt;/em>.&lt;/li>
&lt;/ul>
&lt;p>In addition, it&amp;rsquo;s recommended that you make your function object &lt;strong>idempotent&lt;/strong>.
Non-idempotent functions are supported by Beam, but require additional
thought to ensure correctness when there are external side effects.&lt;/p>
&lt;span class="language-java language-py">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> These requirements apply to subclasses of &lt;code>DoFn&lt;/code>(a function object
used with the &lt;a href="#pardo">ParDo&lt;/a> transform), &lt;code>CombineFn&lt;/code> (a function object used
with the &lt;a href="#combine">Combine&lt;/a> transform), and &lt;code>WindowFn&lt;/code> (a function object
used with the &lt;a href="#windowing">Window&lt;/a> transform).&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> These requirements apply to &lt;code>DoFn&lt;/code>s (a function object
used with the &lt;a href="#pardo">ParDo&lt;/a> transform), &lt;code>CombineFn&lt;/code>s (a function object used
with the &lt;a href="#combine">Combine&lt;/a> transform), and &lt;code>WindowFn&lt;/code>s (a function object
used with the &lt;a href="#windowing">Window&lt;/a> transform).&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;h4 id="user-code-serializability">4.3.1. Serializability&lt;/h4>
&lt;p>Any function object you provide to a transform must be &lt;strong>fully serializable&lt;/strong>.
This is because a copy of the function needs to be serialized and transmitted to
a remote worker in your processing cluster.
&lt;span class="language-java language-py">The base classes for user code, such
as &lt;code>DoFn&lt;/code>, &lt;code>CombineFn&lt;/code>, and &lt;code>WindowFn&lt;/code>, already implement &lt;code>Serializable&lt;/code>;
however, your subclass must not add any non-serializable members.&lt;/span>
&lt;span class="language-go">Funcs are serializable as long as
they are registered with &lt;code>register.FunctionXxY&lt;/code> (for simple functions) or
&lt;code>register.DoFnXxY&lt;/code> (for structural DoFns), and are not closures. Structural
&lt;code>DoFn&lt;/code>s will have all exported fields serialized. Unexported fields are unable to
be serialized, and will be silently ignored.&lt;/span>
&lt;span class="language-typescript">
The Typescript SDK use &lt;a href="https://github.com/nokia/ts-serialize-closures">ts-serialize-closures&lt;/a>
to serialize functions (and other objects).
This works out of the box for functions that are not closures, and also works
for closures as long as the function in question (and any closures it references)
are compiled with the
&lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/tsconfig.json">&lt;code>ts-closure-transform&lt;/code> hooks&lt;/a>
(e.g. by using &lt;code>ttsc&lt;/code> in place of &lt;code>tsc&lt;/code>).
One can alternatively call
&lt;code>requireForSerialization(&amp;quot;importableModuleDefiningFunc&amp;quot;, {func})&lt;/code>
to &lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/serialization.ts">register a function directly&lt;/a> by name which can be less error-prone.
Note that if, as is often the case in Javascript, &lt;code>func&lt;/code> returns objects that
contain closures, it is not sufficient to register &lt;code>func&lt;/code> alone&amp;ndash;its return
value must be registered if used.&lt;/span>&lt;/p>
&lt;p>Some other serializability factors you should keep in mind are:&lt;/p>
&lt;ul>
&lt;li>&lt;span class="language-java language-py">Transient&lt;/span>&lt;span class="language-go">Unexported&lt;/span>
fields in your function object are &lt;em>not&lt;/em> transmitted to worker
instances, because they are not automatically serialized.&lt;/li>
&lt;li>Avoid loading a field with a large amount of data before serialization.&lt;/li>
&lt;li>Individual instances of your function object cannot share data.&lt;/li>
&lt;li>Mutating a function object after it gets applied will have no effect.&lt;/li>
&lt;/ul>
&lt;span class="language-java">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Take care when declaring your function object inline by using an anonymous
inner class instance. In a non-static context, your inner class instance will
implicitly contain a pointer to the enclosing class and that class&amp;rsquo; state.
That enclosing class will also be serialized, and thus the same considerations
that apply to the function object itself also apply to this outer class.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> There&amp;rsquo;s no way to detect if a function is a closure. Closures will cause
runtime errors and pipeline failures. Avoid using anonymous functions when possible.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;h4 id="user-code-thread-compatibility">4.3.2. Thread-compatibility&lt;/h4>
&lt;p>Your function object should be thread-compatible. Each instance of your function
object is accessed by a single thread at a time on a worker instance, unless you
explicitly create your own threads. Note, however, that &lt;strong>the Beam SDKs are not
thread-safe&lt;/strong>. If you create your own threads in your user code, you must
provide your own synchronization. &lt;span class="language-java"> Note that static members in your function
object are not passed to worker instances and that multiple instances of your
function may be accessed from different threads.&lt;/span>&lt;/p>
&lt;h4 id="user-code-idempotence">4.3.3. Idempotence&lt;/h4>
&lt;p>It&amp;rsquo;s recommended that you make your function object idempotent&amp;ndash;that is, that it
can be repeated or retried as often as necessary without causing unintended side
effects. Non-idempotent functions are supported, however the Beam model provides
no guarantees as to the number of times your user code might be invoked or retried;
as such, keeping your function object idempotent keeps your pipeline&amp;rsquo;s output
deterministic, and your transforms&amp;rsquo; behavior more predictable and easier to debug.&lt;/p>
&lt;h3 id="side-inputs">4.4. Side inputs&lt;/h3>
&lt;p>In addition to the main input &lt;code>PCollection&lt;/code>, you can provide additional inputs
to a &lt;code>ParDo&lt;/code> transform in the form of side inputs. A side input is an additional
input that your &lt;code>DoFn&lt;/code> can access each time it processes an element in the input
&lt;code>PCollection&lt;/code>. When you specify a side input, you create a view of some other
data that can be read from within the &lt;code>ParDo&lt;/code> transform&amp;rsquo;s &lt;code>DoFn&lt;/code> while processing
each element.&lt;/p>
&lt;p>Side inputs are useful if your &lt;code>ParDo&lt;/code> needs to inject additional data when
processing each element in the input &lt;code>PCollection&lt;/code>, but the additional data
needs to be determined at runtime (and not hard-coded). Such values might be
determined by the input data, or depend on a different branch of your pipeline.&lt;/p>
&lt;p class="language-go">All side input iterables should be registered using a generic &lt;code>register.IterX[...]&lt;/code>
function. This optimizes runtime execution of the iterable.&lt;/p>
&lt;h4 id="side-inputs-pardo">4.4.1. Passing side inputs to ParDo&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Pass side inputs to your ParDo transform by invoking .withSideInputs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Inside your DoFn, access the side input by using the method DoFn.ProcessContext.sideInput.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The input PCollection to ParDo.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// A PCollection of word lengths that we&amp;#39;ll combine into a single value.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordLengths&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span> &lt;span class="c1">// Singleton PCollection
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create a singleton PCollectionView from wordLengths using Combine.globally and View.asSingleton.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">PCollectionView&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxWordLengthCutOffView&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">wordLengths&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">globally&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">Max&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">MaxIntFn&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">asSingletonView&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Apply a ParDo that takes maxWordLengthCutOffView as a side input.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordsBelowCutOff&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// In our DoFn, access the side input.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">lengthCutOff&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">sideInput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">maxWordLengthCutOffView&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">lengthCutOff&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}).&lt;/span>&lt;span class="na">withSideInputs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">maxWordLengthCutOffView&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Side inputs are available as extra arguments in the DoFn&amp;#39;s process method or Map / FlatMap&amp;#39;s callable.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Optional, positional, and keyword arguments are all supported. Deferred arguments are unwrapped into their&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># actual values. For example, using pvalue.AsIteor(pcoll) at pipeline construction time results in an iterable&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># of the actual elements of pcoll being passed into each process invocation. In this example, side inputs are&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># passed to a FlatMap transform as extra arguments and consumed by filter_using_length.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Callable takes additional arguments.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">filter_using_length&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">lower_bound&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">upper_bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;inf&amp;#39;&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">lower_bound&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">upper_bound&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">word&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Construct a deferred side input.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">avg_word_len&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombineGlobally&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MeanCombineFn&lt;/span>&lt;span class="p">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Call with explicit side inputs.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">small_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;small&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">filter_using_length&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># A single deferred side input.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">larger_than_average&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;large&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">filter_using_length&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">lower_bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AsSingleton&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">avg_word_len&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Mix and match.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">small_but_nontrivial&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">filter_using_length&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">lower_bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">upper_bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">AsSingleton&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">avg_word_len&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># We can also pass side inputs to a ParDo transform, which will get passed to its process method.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The first two arguments for the process method would be self and element.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FilterUsingLength&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">lower_bound&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">upper_bound&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;inf&amp;#39;&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">lower_bound&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">upper_bound&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">small_words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FilterUsingLength&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Side inputs are provided using `beam.SideInput` in the DoFn&amp;#39;s ProcessElement method.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Side inputs can be arbitrary PCollections, which can then be iterated over per element
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// in a DoFn.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Side input parameters appear after main input elements, and before any output emitters.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">words&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// avgWordLength is a PCollection containing a single element, a singleton.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">avgWordLength&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Mean&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">wordLengths&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Side inputs are added as with the beam.SideInput option to beam.ParDo.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">wordsAboveCutOff&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">filterWordsAbove&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SideInput&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Input&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">avgWordLength&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">wordsBelowCutOff&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">filterWordsBelow&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">SideInput&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Input&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">avgWordLength&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// filterWordsAbove is a DoFn that takes in a word,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and a singleton side input iterator as of a length cut off
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and only emits words that are beneath that cut off.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">//
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// If the iterator has no elements, an error is returned, aborting processing.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">filterWordsAbove&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lengthCutOffIter&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="kt">float64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitAboveCutoff&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">var&lt;/span> &lt;span class="nx">cutOff&lt;/span> &lt;span class="kt">float64&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">ok&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">lengthCutOffIter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">cutOff&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">ok&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Errorf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;no length cutoff provided&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nx">cutOff&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitAboveCutoff&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// filterWordsBelow is a DoFn that takes in a word,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and a singleton side input of a length cut off
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and only emits words that are beneath that cut off.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">//
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// If the side input isn&amp;#39;t a singleton, a runtime panic will occur.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">filterWordsBelow&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lengthCutOff&lt;/span> &lt;span class="kt">float64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitBelowCutoff&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="nx">lengthCutOff&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitBelowCutoff&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function3x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filterWordsAbove&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function3x0&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filterWordsBelow&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 1 input of type string =&amp;gt; Emitter1[string]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 1 input of type float64 =&amp;gt; Iter1[float64]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Iter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">float64&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The Go SDK doesn&amp;#39;t support custom ViewFns.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// See https://github.com/apache/beam/issues/18602 for details
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// on how to contribute them!
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Side inputs are provided by passing an extra context object to
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// `map`, `flatMap`, or `parDo` transforms. This object will get passed as an
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// extra argument to the provided function (or `process` method of the `DoFn`).
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// `SideInputParam` properties (generally created with `pardo.xxxSideInput(...)`)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// have a `lookup` method that can be invoked from within the process method.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Let words be a PCollection of strings.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">words&lt;/span> : &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// meanLengthPColl will contain a single number whose value is the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// average length of the words
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">meanLengthPColl&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">number&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">groupGlobally&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="p">&amp;gt;()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">combining&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiners&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">mean&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;mean&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(({&lt;/span> &lt;span class="nx">mean&lt;/span> &lt;span class="p">})&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">mean&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Now we use this as a side input to yield only words that are
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// smaller than average.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">const&lt;/span> &lt;span class="nx">smallWords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">flatMap&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This is the function, taking context as a second argument.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">function&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="nx">keepSmall&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">meanLength&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">lookup&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// This is the context that will be passed as a second argument.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">meanLength&lt;/span>: &lt;span class="kt">pardo.singletonSideInput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">meanLengthPColl&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="side-inputs-windowing">4.4.2. Side inputs and windowing&lt;/h4>
&lt;p>A windowed &lt;code>PCollection&lt;/code> may be infinite and thus cannot be compressed into a
single value (or single collection class). When you create a &lt;code>PCollectionView&lt;/code>
of a windowed &lt;code>PCollection&lt;/code>, the &lt;code>PCollectionView&lt;/code> represents a single entity
per window (one singleton per window, one list per window, etc.).&lt;/p>
&lt;p>Beam uses the window(s) for the main input element to look up the appropriate
window for the side input element. Beam projects the main input element&amp;rsquo;s window
into the side input&amp;rsquo;s window set, and then uses the side input from the
resulting window. If the main input and side inputs have identical windows, the
projection provides the exact corresponding window. However, if the inputs have
different windows, Beam uses the projection to choose the most appropriate side
input window.&lt;/p>
&lt;p>For example, if the main input is windowed using fixed-time windows of one
minute, and the side input is windowed using fixed-time windows of one hour,
Beam projects the main input window against the side input window set and
selects the side input value from the appropriate hour-long side input window.&lt;/p>
&lt;p>If the main input element exists in more than one window, then &lt;code>processElement&lt;/code>
gets called multiple times, once for each window. Each call to &lt;code>processElement&lt;/code>
projects the &amp;ldquo;current&amp;rdquo; window for the main input element, and thus might provide
a different view of the side input each time.&lt;/p>
&lt;p>If the side input has multiple trigger firings, Beam uses the value from the
latest trigger firing. This is particularly useful if you use a side input with
a single global window and specify a trigger.&lt;/p>
&lt;h3 id="additional-outputs">4.5. Additional outputs&lt;/h3>
&lt;p class="language-java language-py">While &lt;code>ParDo&lt;/code> always produces a main output &lt;code>PCollection&lt;/code> (as the return value
from &lt;code>apply&lt;/code>), you can also have your &lt;code>ParDo&lt;/code> produce any number of additional
output &lt;code>PCollection&lt;/code>s. If you choose to have multiple outputs, your &lt;code>ParDo&lt;/code>
returns all of the output &lt;code>PCollection&lt;/code>s (including the main output) bundled
together.&lt;/p>
&lt;p class="language-go">While &lt;code>beam.ParDo&lt;/code> always produces an output &lt;code>PCollection&lt;/code>, your &lt;code>DoFn&lt;/code> can produce any
number of additional output &lt;code>PCollections&lt;/code>s, or even none at all.
If you choose to have multiple outputs, your &lt;code>DoFn&lt;/code> needs to be called with the &lt;code>ParDo&lt;/code>
function that matches the number of outputs. &lt;code>beam.ParDo2&lt;/code> for two output &lt;code>PCollection&lt;/code>s,
&lt;code>beam.ParDo3&lt;/code> for three and so on until &lt;code>beam.ParDo7&lt;/code>. If you need more, you can
use &lt;code>beam.ParDoN&lt;/code> which will return a &lt;code>[]beam.PCollection&lt;/code>.&lt;/p>
&lt;p class="language-typescript">While &lt;code>ParDo&lt;/code> always produces a main output &lt;code>PCollection&lt;/code> (as the return value
from &lt;code>apply&lt;/code>). If you want to have multiple outputs, emit an object with distinct
properties in your &lt;code>ParDo&lt;/code> operation and follow this operation with a &lt;code>Split&lt;/code>
to break it into multiple &lt;code>PCollection&lt;/code>s.&lt;/p>
&lt;p class="language-yaml">In Beam YAML, one obtains multiple outputs by emitting all outputs to a single
&lt;code>PCollection&lt;/code>, possibly with an extra field, and then using &lt;code>Partition&lt;/code> to
split this single &lt;code>PCollection&lt;/code> into multiple distinct &lt;code>PCollection&lt;/code>
outputs.&lt;/p>
&lt;h4 id="output-tags">4.5.1. Tags for multiple outputs&lt;/h4>
&lt;p class="language-typescript">The &lt;code>Split&lt;/code> PTransform will take a PCollection of elements of the form
&lt;code>{tagA?: A, tagB?: B, ...}&lt;/code> and return a object
&lt;code>{tagA: PCollection&amp;lt;A&amp;gt;, tagB: PCollection&amp;lt;B&amp;gt;, ...}&lt;/code>.
The set of expected tags is passed to the operation; how multiple or
unknown tags are handled can be specified by passing a non-default
&lt;code>SplitOptions&lt;/code> instance.&lt;/p>
&lt;p class="language-go">The Go SDK doesn&amp;rsquo;t use output tags, and instead uses positional ordering for
multiple output PCollections.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// To emit elements to multiple output PCollections, create a TupleTag object to identify each collection
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// that your ParDo produces. For example, if your ParDo produces three output PCollections (the main output
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and two additional outputs), you must create three TupleTags. The following example code shows how to
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// create TupleTags for a ParDo with three output PCollections.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Input PCollection to our ParDo.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The ParDo will filter words whose length is below a cutoff and add them to
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// the main output PCollection&amp;lt;String&amp;gt;.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// If a word is above the cutoff, the ParDo will add the word length to an
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// output PCollection&amp;lt;Integer&amp;gt;.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// If a word starts with the string &amp;#34;MARKER&amp;#34;, the ParDo will add that word to an
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// output PCollection&amp;lt;String&amp;gt;.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">wordLengthCutOff&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">10&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create three TupleTags, one for each output PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Output that contains words below the length cutoff.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordsBelowCutOffTag&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output that contains word lengths.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">wordLengthsAboveCutOffTag&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;(){};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output that contains &amp;#34;MARKER&amp;#34; words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">markedWordsTag&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Passing Output Tags to ParDo:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// After you specify the TupleTags for each of your ParDo outputs, pass the tags to your ParDo by invoking
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// .withOutputTags. You pass the tag for the main output first, and then the tags for any additional outputs
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// in a TupleTagList. Building on our previous example, we pass the three TupleTags for our three output
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// PCollections to our ParDo. Note that all of the outputs (including the main output PCollection) are
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// bundled into the returned PCollectionTuple.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollectionTuple&lt;/span> &lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// DoFn continues here.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Specify the tag for the main output.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withOutputTags&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">wordsBelowCutOffTag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Specify the tags for the two additional outputs as a TupleTagList.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">TupleTagList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">wordLengthsAboveCutOffTag&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">markedWordsTag&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># To emit elements to multiple output PCollections, invoke with_outputs() on the ParDo, and specify the&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># expected tags for the outputs. with_outputs() returns a DoOutputsTuple object. Tags specified in&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># with_outputs are attributes on the returned DoOutputsTuple object. The tags give access to the&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># corresponding output PCollections.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ProcessWords&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">cutoff_length&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;x&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_outputs&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;above_cutoff_lengths&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;marked strings&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">main&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;below_cutoff_strings&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">below&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">below_cutoff_strings&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">above&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">above_cutoff_lengths&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">marked&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;marked strings&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># indexing works as well&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The result is also iterable, ordered in the same order that the tags were passed to with_outputs(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># the main tag (if specified) first.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">below&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">above&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marked&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">words&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ProcessWords&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">cutoff_length&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;x&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">with_outputs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;above_cutoff_lengths&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;marked strings&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">main&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;below_cutoff_strings&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// beam.ParDo3 returns PCollections in the same order as
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// the emit function parameters in processWords.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">below&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">above&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">marked&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo3&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">processWords&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// processWordsMixed uses both a standard return and an emitter function.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The standard return produces the first PCollection from beam.ParDo2,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// and the emitter produces the second PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">length&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">mixedMarked&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo2&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">processWordsMixed&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="err">#&lt;/span> &lt;span class="nx">Create&lt;/span> &lt;span class="nx">three&lt;/span> &lt;span class="nx">PCollections&lt;/span> &lt;span class="kr">from&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="nx">single&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="nx">PCollection&lt;/span>&lt;span class="p">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">below&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">above&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">marked&lt;/span> &lt;span class="p">}&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">to_split&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">split&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s2">&amp;#34;below&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;above&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;marked&amp;#34;&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="multiple-outputs-dofn">4.5.2. Emitting to multiple outputs in your DoFn&lt;/h4>
&lt;p class="language-go">Call emitter functions as needed to produce 0 or more elements for its matching
&lt;code>PCollection&lt;/code>. The same value can be emitted with multiple emitters.
As normal, do not mutate values after emitting them from any emitter.&lt;/p>
&lt;p class="language-go">All emitters should be registered using a generic &lt;code>register.EmitterX[...]&lt;/code>
function. This optimizes runtime execution of the emitter.&lt;/p>
&lt;p class="language-go">DoFns can also return a single element via the standard return.
The standard return is always the first PCollection returned from beam.ParDo.
Other emitters output to their own PCollections in their defined parameter order.&lt;/p>
&lt;p class="language-yaml">&lt;code>MapToFields&lt;/code> is always one-to-one. To perform a one-to-many mapping one can
first map a field to an iterable type and then follow this transform with an
&lt;code>Explode&lt;/code> transform that will emit multiple values, one per value of the
exploded field.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Inside your ParDo&amp;#39;s DoFn, you can emit an element to a specific output PCollection by providing a
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// MultiOutputReceiver to your process method, and passing in the appropriate TupleTag to obtain an OutputReceiver.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// After your ParDo, extract the resulting output PCollections from the returned PCollectionTuple.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Based on the previous example, this shows the DoFn emitting to the main output and two additional outputs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">MultiOutputReceiver&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">wordLengthCutOff&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Emit short word to the main output.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// In this example, it is the output with tag wordsBelowCutOffTag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">wordsBelowCutOffTag&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Emit long word length to the output with tag wordLengthsAboveCutOffTag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">wordLengthsAboveCutOffTag&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">startsWith&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;MARKER&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Emit word to the output with tag markedWordsTag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">markedWordsTag&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">word&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Inside your ParDo&amp;#39;s DoFn, you can emit an element to a specific output by wrapping the value and the output tag (str).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># using the pvalue.OutputValue wrapper class.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Based on the previous example, this shows the DoFn emitting to the main output and two additional outputs.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ProcessWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">cutoff_length&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marker&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="n">cutoff_length&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Emit this short word to the main output.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Emit this word&amp;#39;s long length to the &amp;#39;above_cutoff_lengths&amp;#39; output.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TaggedOutput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;above_cutoff_lengths&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">startswith&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">marker&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Emit this word to a different output with the &amp;#39;marked strings&amp;#39; tag.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TaggedOutput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;marked strings&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Producing multiple outputs is also available in Map and FlatMap.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Here is an example that uses FlatMap and shows that the tags do not need to be specified ahead of time.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">even_odd&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">pvalue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TaggedOutput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;odd&amp;#39;&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">2&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="s1">&amp;#39;even&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">10&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">x&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numbers&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">even_odd&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_outputs&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">evens&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">even&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">odds&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">odd&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">tens&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">results&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kc">None&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># the undeclared main output&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// processWords is a DoFn that has 3 output PCollections. The emitter functions
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// are matched in positional order to the PCollections returned by beam.ParDo3.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">processWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitBelowCutoff&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitAboveCutoff&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitMarked&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">const&lt;/span> &lt;span class="nx">cutOff&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="mi">5&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">&amp;lt;&lt;/span> &lt;span class="nx">cutOff&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitBelowCutoff&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitAboveCutoff&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nf">isMarkedWord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitMarked&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// processWordsMixed demonstrates mixing an emitter, with a standard return.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// If a standard return is used, it will always be the first returned PCollection,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// followed in positional order by the emitter functions.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">processWordsMixed&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitMarked&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nf">isMarkedWord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitMarked&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function4x0&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">processWords&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Function2x1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">processWordsMixed&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// 1 input of type string =&amp;gt; Emitter1[string]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">register&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Emitter1&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">to_split&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">flatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">function&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">length&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">below&lt;/span>: &lt;span class="kt">word&lt;/span> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">above&lt;/span>: &lt;span class="kt">word&lt;/span> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">isMarkedWord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">marked&lt;/span>: &lt;span class="kt">word&lt;/span> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">SomeProducingTransform&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">word&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;line.split()&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>- &lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Explode&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">input&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">word&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="other-dofn-parameters">4.5.3. Accessing additional parameters in your DoFn&lt;/h4>
&lt;p class="language-java">In addition to the element and the &lt;code>OutputReceiver&lt;/code>, Beam will populate other parameters to your DoFn&amp;rsquo;s &lt;code>@ProcessElement&lt;/code> method.
Any combination of these parameters can be added to your process method in any order.&lt;/p>
&lt;p class="language-py">In addition to the element, Beam will populate other parameters to your DoFn&amp;rsquo;s &lt;code>process&lt;/code> method.
Any combination of these parameters can be added to your process method in any order.&lt;/p>
&lt;p class="language-typescript">In addition to the element, Beam will populate other parameters to your DoFn&amp;rsquo;s &lt;code>process&lt;/code> method.
These are available by placing accessors in the context argument, just as for side inputs.&lt;/p>
&lt;p class="language-go">In addition to the element, Beam will populate other parameters to your DoFn&amp;rsquo;s &lt;code>ProcessElement&lt;/code> method.
Any combination of these parameters can be added to your process method in a standard order.&lt;/p>
&lt;p class="language-go">&lt;strong>context.Context:&lt;/strong>
To support consolidated logging and user defined metrics, a &lt;code>context.Context&lt;/code> parameter can be requested.
Per Go conventions, if present it&amp;rsquo;s required to be the first parameter of the &lt;code>DoFn&lt;/code> method.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">MyDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;strong>Timestamp:&lt;/strong>
To access the timestamp of an input element, add a parameter annotated with &lt;code>@Timestamp&lt;/code> of type &lt;code>Instant&lt;/code>. For example:&lt;/p>
&lt;p class="language-py">&lt;strong>Timestamp:&lt;/strong>
To access the timestamp of an input element, add a keyword parameter default to &lt;code>DoFn.TimestampParam&lt;/code>. For example:&lt;/p>
&lt;p class="language-go">&lt;strong>Timestamp:&lt;/strong>
To access the timestamp of an input element, add a &lt;code>beam.EventTime&lt;/code> parameter before the element. For example:&lt;/p>
&lt;p class="language-typescript">&lt;strong>Timestamp:&lt;/strong>
To access the window an input element falls into, add a &lt;code>pardo.windowParam()&lt;/code> to the context argument.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">timestamp&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}})&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ProcessRecord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">timestamp&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># access timestamp of element.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">MyDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ts&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">function&lt;/span> &lt;span class="nx">processFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">timestamp&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">lookup&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">processFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">timestamp&lt;/span>: &lt;span class="kt">pardo.timestampParam&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;strong>Window:&lt;/strong>
To access the window an input element falls into, add a parameter of the type of the window used for the input &lt;code>PCollection&lt;/code>.
If the parameter is a window type (a subclass of &lt;code>BoundedWindow&lt;/code>) that does not match the input &lt;code>PCollection&lt;/code>, then an error
will be raised. If an element falls in multiple windows (for example, this will happen when using &lt;code>SlidingWindows&lt;/code>), then the
&lt;code>@ProcessElement&lt;/code> method will be invoked multiple time for the element, once for each window. For example, when fixed windows
are being used, the window is of type &lt;code>IntervalWindow&lt;/code>.&lt;/p>
&lt;p class="language-py">&lt;strong>Window:&lt;/strong>
To access the window an input element falls into, add a keyword parameter default to &lt;code>DoFn.WindowParam&lt;/code>.
If an element falls in multiple windows (for example, this will happen when using &lt;code>SlidingWindows&lt;/code>), then the
&lt;code>process&lt;/code> method will be invoked multiple time for the element, once for each window.&lt;/p>
&lt;p class="language-go">&lt;strong>Window:&lt;/strong>
To access the window an input element falls into, add a &lt;code>beam.Window&lt;/code> parameter before the element.
If an element falls in multiple windows (for example, this will happen when using SlidingWindows),
then the &lt;code>ProcessElement&lt;/code> method will be invoked multiple time for the element, once for each window.
Since &lt;code>beam.Window&lt;/code> is an interface it&amp;rsquo;s possible to type assert to the concrete implementation of the window.
For example, when fixed windows are being used, the window is of type &lt;code>window.IntervalWindow&lt;/code>.&lt;/p>
&lt;p class="language-typescript">&lt;strong>Window:&lt;/strong>
To access the window an input element falls into, add a &lt;code>pardo.windowParam()&lt;/code> to the context argument.
If an element falls in multiple windows (for example, this will happen when using &lt;code>SlidingWindows&lt;/code>), then the
function will be invoked multiple time for the element, once for each window.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">IntervalWindow&lt;/span> &lt;span class="n">window&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}})&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ProcessRecord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">window&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowParam&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># access window e.g. window.end.micros&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">MyDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">iw&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">.(&lt;/span>&lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IntervalWindow&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">processWithWindow&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">timestamp&lt;/span>: &lt;span class="kt">pardo.windowParam&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;strong>PaneInfo:&lt;/strong>
When triggers are used, Beam provides a &lt;code>PaneInfo&lt;/code> object that contains information about the current firing. Using &lt;code>PaneInfo&lt;/code>
you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.&lt;/p>
&lt;p class="language-py">&lt;strong>PaneInfo:&lt;/strong>
When triggers are used, Beam provides a &lt;code>DoFn.PaneInfoParam&lt;/code> object that contains information about the current firing. Using &lt;code>DoFn.PaneInfoParam&lt;/code>
you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.
This feature implementation in Python SDK is not fully completed; see more at &lt;a href="https://github.com/apache/beam/issues/18721">Issue 17821&lt;/a>.&lt;/p>
&lt;p class="language-go">&lt;strong>PaneInfo:&lt;/strong>
When triggers are used, Beam provides &lt;code>beam.PaneInfo&lt;/code> object that contains information about the current firing. Using &lt;code>beam.PaneInfo&lt;/code>
you can determine whether this is an early or a late firing, and how many times this window has already fired for this key.&lt;/p>
&lt;p class="language-typescript">&lt;strong>Window:&lt;/strong>
To access the window an input element falls into, add a &lt;code>pardo.paneInfoParam()&lt;/code> to the context argument.
Using &lt;code>beam.PaneInfo&lt;/code> you can determine whether this is an early or a late firing,
and how many times this window has already fired for this key.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PaneInfo&lt;/span> &lt;span class="n">paneInfo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}})&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ProcessRecord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pane_info&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PaneInfoParam&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># access pane info, e.g. pane_info.is_first, pane_info.is_last, pane_info.timing&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">extractWordsFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">pn&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PaneInfo&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">line&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">pn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timing&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">typex&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PaneEarly&lt;/span> &lt;span class="o">||&lt;/span> &lt;span class="nx">pn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timing&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">typex&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PaneOnTime&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform operation ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">pn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timing&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">typex&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PaneLate&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform operation ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">pn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IsFirst&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform operation ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">pn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IsLast&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform operation ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">words&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">strings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Split&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34; &amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">w&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">processWithPaneInfo&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">timestamp&lt;/span>: &lt;span class="kt">pardo.paneInfoParam&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">});&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;strong>PipelineOptions:&lt;/strong>
The &lt;code>PipelineOptions&lt;/code> for the current pipeline can always be accessed in a process method by adding it
as a parameter:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">word&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}})&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;code>@OnTimer&lt;/code> methods can also access many of these parameters. Timestamp, Window, key, &lt;code>PipelineOptions&lt;/code>, &lt;code>OutputReceiver&lt;/code>, and
&lt;code>MultiOutputReceiver&lt;/code> parameters can all be accessed in an &lt;code>@OnTimer&lt;/code> method. In addition, an &lt;code>@OnTimer&lt;/code> method can take
a parameter of type &lt;code>TimeDomain&lt;/code> which tells whether the timer is based on event time or processing time.
Timers are explained in more detail in the
&lt;a href="/blog/2017/08/28/timely-processing.html">Timely (and Stateful) Processing with Apache Beam&lt;/a> blog post.&lt;/p>
&lt;p class="language-py">&lt;strong>Timer and State:&lt;/strong>
In addition to aforementioned parameters, user defined Timer and State parameters can be used in a stateful DoFn.
Timers and States are explained in more detail in the
&lt;a href="/blog/2017/08/28/timely-processing.html">Timely (and Stateful) Processing with Apache Beam&lt;/a> blog post.&lt;/p>
&lt;p class="language-go">&lt;strong>Timer and State:&lt;/strong>
User defined State and Timer parameters can be used in a stateful DoFn.
Timers and States are explained in more detail in the
&lt;a href="/blog/2017/08/28/timely-processing.html">Timely (and Stateful) Processing with Apache Beam&lt;/a> blog post.&lt;/p>
&lt;p class="language-typescript">&lt;strong>Timer and State:&lt;/strong>
This feature isn&amp;rsquo;t yet implemented in the Typescript SDK,
but we welcome &lt;a href="/contribute/">contributions&lt;/a>.
In the meantime, Typescript pipelines wishing to use state and timers can do so
using &lt;a href="#use-x-lang-transforms">cross-language transforms&lt;/a>.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">StatefulDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;An example stateful DoFn with state and timer&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">BUFFER_STATE_1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer1&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BytesCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">BUFFER_STATE_2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">WATERMARK_TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;watermark_timer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WATERMARK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timestamp&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">window&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER_STATE_1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_2&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER_STATE_2&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">watermark_timer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">WATERMARK_TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Do your processing here&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Read all the data from buffer1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">all_values_in_buffer_1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">buffer_1&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">StatefulDoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_is_clear_buffer_1_required&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">all_values_in_buffer_1&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># clear the buffer data if required conditions are met.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_1&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># add the value to buffer 2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">StatefulDoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_all_condition_met&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Clear the timer if certain condition met and you don&amp;#39;t want to trigger&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># the callback method.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">watermark_timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">WATERMARK_TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">on_expiry_1&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timestamp&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">window&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">KeyParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_1&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER_STATE_1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer_2&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER_STATE_2&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Window and key parameters are really useful especially for debugging issues.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="s1">&amp;#39;expired1&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@staticmethod&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">_all_condition_met&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># some logic&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">True&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@staticmethod&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">_is_clear_buffer_1_required&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">buffer_1_data&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Some business logic&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">True&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// stateAndTimersFn is an example stateful DoFn with state and a timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">stateAndTimersFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Buffer1&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Buffer2&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Watermark&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">stateAndTimersFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... handle processing elements here, set a callback timer...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Read all the data from Buffer1 in this window.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">vals&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ok&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Buffer1&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">ok&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">shouldClearBuffer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">vals&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// clear the buffer data if required conditions are met.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Buffer1&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the value to Buffer2.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Buffer2&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">allConditionsMet&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Clear the timer if certain condition met and you don&amp;#39;t want to trigger
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// the callback method.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Watermark&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">stateAndTimersFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Window and key parameters are really useful especially for debugging issues.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Watermark&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// timer expired, emit a different signal
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">stateAndTimersFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">shouldClearBuffer&lt;/span>&lt;span class="p">([]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// some business logic
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="kc">false&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">stateAndTimersFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">allConditionsMet&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// other business logic
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="kc">true&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Not yet implemented.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="composite-transforms">4.6. Composite transforms&lt;/h3>
&lt;p>Transforms can have a nested structure, where a complex transform performs
multiple simpler transforms (such as more than one &lt;code>ParDo&lt;/code>, &lt;code>Combine&lt;/code>,
&lt;code>GroupByKey&lt;/code>, or even other composite transforms). These transforms are called
composite transforms. Nesting multiple transforms inside a single composite
transform can make your code more modular and easier to understand.&lt;/p>
&lt;p>The Beam SDK comes packed with many useful composite transforms. See the API
reference pages for a list of transforms:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/transforms/package-summary.html">Pre-written Beam transforms for Java&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/2.56.0/apache_beam.transforms.html">Pre-written Beam transforms for Python&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/transforms">Pre-written Beam transforms for Go&lt;/a>&lt;/li>
&lt;/ul>
&lt;h4 id="composite-transform-example">4.6.1. An example composite transform&lt;/h4>
&lt;p>The &lt;code>CountWords&lt;/code> transform in the &lt;a href="/get-started/wordcount-example/">WordCount example program&lt;/a>
is an example of a composite transform. &lt;code>CountWords&lt;/code> is a &lt;code>PTransform&lt;/code>
&lt;span class="language-java language-py">subclass&lt;/span> that consists
of multiple nested transforms.&lt;/p>
&lt;p>&lt;span class="language-java language-py">In its &lt;code>expand&lt;/code> method, the&lt;/span>
&lt;span class="language-go">The&lt;/span> &lt;code>CountWords&lt;/code> transform applies the following
transform operations:&lt;/p>
&lt;ol>
&lt;li>It applies a &lt;code>ParDo&lt;/code> on the input &lt;code>PCollection&lt;/code> of text lines, producing
an output &lt;code>PCollection&lt;/code> of individual words.&lt;/li>
&lt;li>It applies the Beam SDK library transform &lt;code>Count&lt;/code> on the &lt;code>PCollection&lt;/code> of
words, producing a &lt;code>PCollection&lt;/code> of key/value pairs. Each key represents a
word in the text, and each value represents the number of times that word
appeared in the original data.&lt;/li>
&lt;/ol>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">CountWords&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">lines&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert lines of text into individual words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">lines&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ExtractWordsFn&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Count the number of times each word occurs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">wordCounts&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perElement&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">wordCounts&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The CountWords Composite Transform inside the WordCount pipeline.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@beam.ptransform_fn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Convert lines of text into individual words.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ExtractWords&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ExtractWordsFn&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Count the number of times each word occurs.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">combiners&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Count&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PerElement&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Format each word and count into a printable string.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;FormatCounts&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FormatCountsFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CountWords is a function that builds a composite PTransform
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// to count the number of times each word appears.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// A subscope is required for a function to become a composite transform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// We assign it to the original scope variable s to shadow the original
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// for the rest of the CountWords function.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scope&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;CountWords&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Since the same subscope is used for the following transforms,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// they are in the same composite PTransform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert lines of text into individual words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">extractWordsFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Count the number of times each word occurs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">wordCounts&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return any PCollections that should be available after
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// the composite transform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">wordCounts&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">function&lt;/span> &lt;span class="nx">countWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">lines&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">string&lt;/span>&lt;span class="p">&amp;gt;)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">lines&lt;/span> &lt;span class="c1">//
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">.&lt;/span>&lt;span class="nx">map&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="nx">s&lt;/span>: &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">toLowerCase&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">flatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">function&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="nx">splitWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">line&lt;/span>: &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span>&lt;span class="o">*&lt;/span> &lt;span class="nx">line&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">split&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sr">/[^a-z]+/&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">countPerElement&lt;/span>&lt;span class="p">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">counted&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">countWords&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Because &lt;code>Count&lt;/code> is itself a composite transform,
&lt;code>CountWords&lt;/code> is also a nested composite transform.&lt;/p>
&lt;/blockquote>
&lt;h4 id="composite-transform-creation">4.6.2. Creating a composite transform&lt;/h4>
&lt;p class="language-typescript">A PTransform in the Typescript SDK is simply a function that accepts and
returns &lt;code>PValue&lt;/code>s such as &lt;code>PCollection&lt;/code>s.&lt;/p>
&lt;p class="language-java language-py">To create your own composite transform, create a subclass of the &lt;code>PTransform&lt;/code>
class and override the &lt;code>expand&lt;/code> method to specify the actual processing logic.
You can then use this transform just as you would a built-in transform from the
Beam SDK.&lt;/p>
&lt;p class="language-java">For the &lt;code>PTransform&lt;/code> class type parameters, you pass the &lt;code>PCollection&lt;/code> types
that your transform takes as input, and produces as output. To take multiple
&lt;code>PCollection&lt;/code>s as input, or produce multiple &lt;code>PCollection&lt;/code>s as output, use one
of the multi-collection types for the relevant type parameter.&lt;/p>
&lt;p class="language-go">To create your own composite &lt;code>PTransform&lt;/code> call the &lt;code>Scope&lt;/code> method on the current
pipeline scope variable. Transforms passed this new sub-&lt;code>Scope&lt;/code> will be a part of
the same composite &lt;code>PTransform&lt;/code>.&lt;/p>
&lt;p class="language-go">To be able to re-use your Composite, build it inside a normal Go function or method.
This function is passed a scope and input PCollections, and returns any
output PCollections it produces. &lt;strong>Note:&lt;/strong> Such functions cannot be passed directly to
&lt;code>ParDo&lt;/code> functions.&lt;/p>
&lt;p class="language-java language-py">The following code sample shows how to declare a &lt;code>PTransform&lt;/code> that accepts a
&lt;code>PCollection&lt;/code> of &lt;code>String&lt;/code>s for input, and outputs a &lt;code>PCollection&lt;/code> of &lt;code>Integer&lt;/code>s:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ComputeWordLengths&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ComputeWordLengths&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Transform logic goes here.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">pcoll&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CountWords is a function that builds a composite PTransform
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// to count the number of times each word appears.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// A subscope is required for a function to become a composite transform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// We assign it to the original scope variable s to shadow the original
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// for the rest of the CountWords function.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Scope&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;CountWords&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Since the same subscope is used for the following transforms,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// they are in the same composite PTransform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert lines of text into individual words.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">words&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">extractWordsFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Count the number of times each word occurs.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">wordCounts&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">stats&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Count&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">words&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return any PCollections that should be available after
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// the composite transform.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">wordCounts&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py">Within your &lt;code>PTransform&lt;/code> subclass, you&amp;rsquo;ll need to override the &lt;code>expand&lt;/code> method.
The &lt;code>expand&lt;/code> method is where you add the processing logic for the &lt;code>PTransform&lt;/code>.
Your override of &lt;code>expand&lt;/code> must accept the appropriate type of input
&lt;code>PCollection&lt;/code> as a parameter, and specify the output &lt;code>PCollection&lt;/code> as the return
value.&lt;/p>
&lt;p class="language-java language-py">The following code sample shows how to override &lt;code>expand&lt;/code> for the
&lt;code>ComputeWordLengths&lt;/code> class declared in the previous example:&lt;/p>
&lt;p class="language-go">The following code sample shows how to call the &lt;code>CountWords&lt;/code> composite PTransform,
adding it to your pipeline:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ComputeWordLengths&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// transform logic goes here
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ComputeWordLengths&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Transform logic goes here.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">pcoll&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1">// a PCollection of strings.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// A Composite PTransform function is called like any other function.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">wordCounts&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">CountWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">lines&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1">// returns a PCollection&amp;lt;KV&amp;lt;string,int&amp;gt;&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java language-py">As long as you override the &lt;code>expand&lt;/code> method in your &lt;code>PTransform&lt;/code> subclass to
accept the appropriate input &lt;code>PCollection&lt;/code>(s) and return the corresponding
output &lt;code>PCollection&lt;/code>(s), you can include as many transforms as you want. These
transforms can include core transforms, composite transforms, or the transforms
included in the Beam SDK libraries.&lt;/p>
&lt;p class="language-go">Your composite &lt;code>PTransform&lt;/code>s can include as many transforms as you want. These
transforms can include core transforms, other composite transforms, or the transforms
included in the Beam SDK libraries. They can also consume and return as many
&lt;code>PCollection&lt;/code>s as are necessary.&lt;/p>
&lt;p class="language-java language-py">Your composite transform&amp;rsquo;s parameters and return value must match the initial
input type and final return type for the entire transform, even if the
transform&amp;rsquo;s intermediate data changes type multiple times.&lt;/p>
&lt;p class="language-java language-py">&lt;strong>Note:&lt;/strong> The &lt;code>expand&lt;/code> method of a &lt;code>PTransform&lt;/code> is not meant to be invoked
directly by the user of a transform. Instead, you should call the &lt;code>apply&lt;/code> method
on the &lt;code>PCollection&lt;/code> itself, with the transform as an argument. This allows
transforms to be nested within the structure of your pipeline.&lt;/p>
&lt;h4 id="ptransform-style-guide">4.6.3. PTransform Style Guide&lt;/h4>
&lt;p>The &lt;a href="/contribute/ptransform-style-guide/">PTransform Style Guide&lt;/a>
contains additional information not included here, such as style guidelines,
logging and testing guidance, and language-specific considerations. The guide
is a useful starting point when you want to write new composite PTransforms.&lt;/p>
&lt;h2 id="pipeline-io">5. Pipeline I/O&lt;/h2>
&lt;p>When you create a pipeline, you often need to read data from some external
source, such as a file or a database. Likewise, you may
want your pipeline to output its result data to an external storage system.
Beam provides read and write transforms for a &lt;a href="/documentation/io/built-in/">number of common data storage
types&lt;/a>. If you want your pipeline
to read from or write to a data storage format that isn&amp;rsquo;t supported by the
built-in transforms, you can &lt;a href="/documentation/io/developing-io-overview/">implement your own read and write
transforms&lt;/a>.&lt;/p>
&lt;h3 id="pipeline-io-reading-data">5.1. Reading input data&lt;/h3>
&lt;p>Read transforms read data from an external source and return a &lt;code>PCollection&lt;/code>
representation of the data for use by your pipeline. You can use a read
transform at any point while constructing your pipeline to create a new
&lt;code>PCollection&lt;/code>, though it will be most common at the start of your pipeline.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://some/inputData.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;gs://some/inputData.txt&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="err">&amp;#39;&lt;/span>&lt;span class="nx">gs&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="c1">//some/inputData.txt&amp;#39;)
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="pipeline-io-writing-data">5.2. Writing output data&lt;/h3>
&lt;p>Write transforms write the data in a &lt;code>PCollection&lt;/code> to an external data source.
You will most often use write transforms at the end of your pipeline to output
your pipeline&amp;rsquo;s final results. However, you can use a write transform to output
a &lt;code>PCollection&lt;/code>&amp;rsquo;s data at any point in your pipeline.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://some/outputData&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">output&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;gs://some/outputData&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="err">&amp;#39;&lt;/span>&lt;span class="nx">gs&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="c1">//some/inputData.txt&amp;#39;, output)
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="file-based-data">5.3. File-based input and output data&lt;/h3>
&lt;h4 id="file-based-reading-multiple-locations">5.3.1. Reading from multiple locations&lt;/h4>
&lt;p>Many read transforms support reading from multiple input files matching a glob
operator you provide. Note that glob operators are filesystem-specific and obey
filesystem-specific consistency models. The following TextIO example uses a glob
operator (&lt;code>*&lt;/code>) to read all matching input files that have prefix &amp;ldquo;input-&amp;rdquo; and the
suffix &amp;ldquo;.csv&amp;rdquo; in the given location:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;ReadFromText&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;protocol://my_bucket/path/to/input-*.csv&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ReadFromText&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadFromText&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;path/to/input-*.csv&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">lines&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">textio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;path/to/input-*.csv&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>To read data from disparate sources into a single &lt;code>PCollection&lt;/code>, read each one
independently and then use the &lt;a href="#flatten">Flatten&lt;/a> transform to create a single
&lt;code>PCollection&lt;/code>.&lt;/p>
&lt;h4 id="file-based-writing-multiple-files">5.3.2. Writing to multiple output files&lt;/h4>
&lt;p>For file-based output data, write transforms write to multiple output files by
default. When you pass an output file name to a write transform, the file name
is used as the prefix for all output files that the write transform produces.
You can append a suffix to each output file by specifying a suffix.&lt;/p>
&lt;p>The following write transform example writes multiple output files to a
location. Each file has the prefix &amp;ldquo;numbers&amp;rdquo;, a numeric tag, and the suffix
&amp;ldquo;.csv&amp;rdquo;.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">records&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;WriteToText&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;protocol://my_bucket/path/to/numbers&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withSuffix&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;.csv&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">filtered_words&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WriteToText&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToText&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;/path/to/numbers&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">file_name_suffix&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;.csv&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The Go SDK textio doesn&amp;#39;t support sharding on writes yet.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// See https://github.com/apache/beam/issues/21031 for ways
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// to contribute a solution.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="provided-io-transforms">5.4. Beam-provided I/O transforms&lt;/h3>
&lt;p>See the &lt;a href="/documentation/io/built-in/">Beam-provided I/O Transforms&lt;/a>
page for a list of the currently available I/O transforms.&lt;/p>
&lt;h2 id="schemas">6. Schemas&lt;/h2>
&lt;p>Often, the types of the records being processed have an obvious structure. Common Beam sources produce
JSON, Avro, Protocol Buffer, or database row objects; all of these types have well defined structures,
structures that can often be determined by examining the type. Even within a SDK pipeline, Simple Java POJOs
(or equivalent structures in other languages) are often used as intermediate types, and these also have a
clear structure that can be inferred by inspecting the class. By understanding the structure of a pipeline’s
records, we can provide much more concise APIs for data processing.&lt;/p>
&lt;h3 id="what-is-a-schema">6.1. What is a schema?&lt;/h3>
&lt;p>Most structured records share some common characteristics:&lt;/p>
&lt;ul>
&lt;li>They can be subdivided into separate named fields. Fields usually have string names, but sometimes - as in the case of indexed
tuples - have numerical indices instead.&lt;/li>
&lt;li>There is a confined list of primitive types that a field can have. These often match primitive types in most programming
languages: int, long, string, etc.&lt;/li>
&lt;li>Often a field type can be marked as optional (sometimes referred to as nullable) or required.&lt;/li>
&lt;/ul>
&lt;p>Often records have a nested structure. A nested structure occurs when a field itself has subfields so the
type of the field itself has a schema. Fields that are array or map types is also a common feature of these structured
records.&lt;/p>
&lt;p>For example, consider the following schema, representing actions in a fictitious e-commerce company:&lt;/p>
&lt;p>&lt;strong>Purchase&lt;/strong>&lt;/p>
&lt;div class="table-container-wrapper">
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>userId&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>itemId&lt;/td>
&lt;td>INT64&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress&lt;/td>
&lt;td>ROW(ShippingAddress)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>cost&lt;/td>
&lt;td>INT64&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>transactions&lt;/td>
&lt;td>ARRAY[ROW(Transaction)]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;/div>
&lt;p>&lt;strong>ShippingAddress&lt;/strong>&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>streetAddress&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>city&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>state&lt;/td>
&lt;td>nullable STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>country&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>postCode&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>&lt;strong>Transaction&lt;/strong>&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>bank&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>purchaseAmount&lt;/td>
&lt;td>DOUBLE&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>Purchase event records are represented by the above purchase schema. Each purchase event contains a shipping address, which
is a nested row containing its own schema. Each purchase also contains an array of credit-card transactions
(a list, because a purchase might be split across multiple credit cards); each item in the transaction list is a row
with its own schema.&lt;/p>
&lt;p>This provides an abstract description of the types involved, one that is abstracted away from any specific programming
language.&lt;/p>
&lt;p>Schemas provide us a type-system for Beam records that is independent of any specific programming-language type. There
might be multiple Java classes that all have the same schema (for example a Protocol-Buffer class or a POJO class),
and Beam will allow us to seamlessly convert between these types. Schemas also provide a simple way to reason about
types across different programming-language APIs.&lt;/p>
&lt;p>A &lt;code>PCollection&lt;/code> with a schema does not need to have a &lt;code>Coder&lt;/code> specified, as Beam knows how to encode and decode
Schema rows; Beam uses a special coder to encode schema types.&lt;/p>
&lt;h3 id="schemas-for-pl-types">6.2. Schemas for programming language types&lt;/h3>
&lt;p>While schemas themselves are language independent, they are designed to embed naturally into the programming languages
of the Beam SDK being used. This allows Beam users to continue using native types while reaping the advantage of
having Beam understand their element schemas.&lt;/p>
&lt;p class="language-java">In Java you could use the following set of classes to represent the purchase schema. Beam will automatically
infer the correct schema based on the members of the class.&lt;/p>
&lt;p class="language-py">In Python you can use the following set of classes to represent the purchase schema. Beam will automatically infer the correct schema based on the members of the class.&lt;/p>
&lt;p class="language-go">In Go, schema encoding is used by default for struct types, with Exported fields becoming part of the schema.
Beam will automatically infer the schema based on the fields and field tags of the struct, and their order.&lt;/p>
&lt;p class="language-typescript">In Typescript, JSON objects are used to represent schema&amp;rsquo;d data.
Unfortunately type information in Typescript is not propagated to the runtime layer,
so it needs to be manually specified in some places (e.g. when using cross-language pipelines).&lt;/p>
&lt;p class="language-yaml">In Beam YAML, all transforms produce and accept schema&amp;rsquo;d data which is used to validate the pipeline.&lt;/p>
&lt;p class="language-yaml">In some cases, Beam is unable to figure out the output type of a mapping function.
In this case, you can specify it manually using
&lt;a href="https://json-schema.org/understanding-json-schema/reference/type">JSON schema syntax&lt;/a>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaBeanSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">Purchase&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getUserId&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns the id of the user who made the purchase.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="nf">getItemId&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns the identifier of the item that was purchased.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">ShippingAddress&lt;/span> &lt;span class="nf">getShippingAddress&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns the shipping address, a nested type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="nf">getCostCents&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns the cost of the item.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">getTransactions&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns the transactions that paid for this purchase (returns a list, since the purchase might be spread out over multiple credit cards).
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@SchemaCreate&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">Purchase&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">userId&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">itemId&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ShippingAddress&lt;/span> &lt;span class="n">shippingAddress&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">costCents&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">transactions&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaBeanSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">ShippingAddress&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getStreetAddress&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getCity&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Nullable&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getState&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getCountry&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getPostCode&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@SchemaCreate&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">ShippingAddress&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">streetAddress&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">city&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@Nullable&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">country&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="n">postCode&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaBeanSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">Transaction&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getBank&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="nf">getPurchaseAmount&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@SchemaCreate&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">Transaction&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">typing&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">Purchase&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">user_id&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span> &lt;span class="c1"># The id of the user who made the purchase.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">item_id&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">int&lt;/span> &lt;span class="c1"># The identifier of the item that was purchased.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">shipping_address&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">ShippingAddress&lt;/span> &lt;span class="c1"># The shipping address, a nested type.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">cost_cents&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">int&lt;/span> &lt;span class="c1"># The cost of the item&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">transactions&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sequence&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1"># The transactions that paid for this purchase (a list, since the purchase might be spread out over multiple credit cards).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ShippingAddress&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">street_address&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">city&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Optional&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">country&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">postal_code&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">Transaction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bank&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchase_amount&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">float&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Purchase&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ID of the user who made the purchase.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">UserID&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;userId&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Identifier of the item that was purchased.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">ItemID&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="s">`beam:&amp;#34;itemId&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The shipping address, a nested type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">ShippingAddress&lt;/span> &lt;span class="nx">ShippingAddress&lt;/span> &lt;span class="s">`beam:&amp;#34;shippingAddress&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The cost of the item in cents.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Cost&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="s">`beam:&amp;#34;cost&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The transactions that paid for this purchase.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// A slice since the purchase might be spread out over multiple
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// credit cards.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Transactions&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">Transaction&lt;/span> &lt;span class="s">`beam:&amp;#34;transactions&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">ShippingAddress&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">StreetAddress&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;streetAddress&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">City&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;city&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">State&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;state&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Country&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;country&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">PostCode&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;postCode&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Transaction&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Bank&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;bank&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">PurchaseAmount&lt;/span> &lt;span class="kt">float64&lt;/span> &lt;span class="s">`beam:&amp;#34;purchaseAmount&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">intField&lt;/span>: &lt;span class="kt">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stringField&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;a&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">intField&lt;/span>: &lt;span class="kt">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stringField&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;b&amp;#34;&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Let beam know the type of the elements by providing an exemplar.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withRowCoder&lt;/span>&lt;span class="p">({&lt;/span> &lt;span class="nx">intField&lt;/span>: &lt;span class="kt">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">stringField&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;&amp;#34;&lt;/span> &lt;span class="p">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">MapToFields&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">fields&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">new_field&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">expression&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s2">&amp;#34;hex(weight)&amp;#34;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">output_type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>{&lt;span class="w"> &lt;/span>&lt;span class="nt">&amp;#34;type&amp;#34;: &lt;/span>&lt;span class="s2">&amp;#34;string&amp;#34;&lt;/span>&lt;span class="w"> &lt;/span>}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Using JavaBean classes as above is one way to map a schema to Java classes. However multiple Java classes might have
the same schema, in which case the different Java types can often be used interchangeably. Beam will add implicit
conversions between types that have matching schemas. For example, the above
&lt;code>Transaction&lt;/code> class has the same schema as the following class:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaFieldSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TransactionPojo&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">So if we had two &lt;code>PCollection&lt;/code>s as follows&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">transactionBeans&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readTransactionsAsJavaBean&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">TransactionPojos&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">transactionPojos&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readTransactionsAsPojo&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Then these two &lt;code>PCollection&lt;/code>s would have the same schema, even though their Java types would be different. This means
for example the following two code snippets are valid:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">transactionBeans&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;...&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">TransactionPojo&lt;/span> &lt;span class="n">pojo&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">and&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">transactionPojos&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;...&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">Transaction&lt;/span> &lt;span class="n">row&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Even though the in both cases the &lt;code>@Element&lt;/code> parameter differs from the &lt;code>PCollection&lt;/code>&amp;rsquo;s Java type, since the
schemas are the same Beam will automatically make the conversion. The built-in &lt;code>Convert&lt;/code> transform can also be used
to translate between Java types of equivalent schemas, as detailed below.&lt;/p>
&lt;h3 id="schema-definition">6.3. Schema definition&lt;/h3>
&lt;p>The schema for a &lt;code>PCollection&lt;/code> defines elements of that &lt;code>PCollection&lt;/code> as an ordered list of named fields. Each field
has a name, a type, and possibly a set of user options. The type of a field can be primitive or composite. The following
are the primitive types currently supported by Beam:&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>Type&lt;/th>
&lt;th>Description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>BYTE&lt;/td>
&lt;td>An 8-bit signed value&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INT16&lt;/td>
&lt;td>A 16-bit signed value&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INT32&lt;/td>
&lt;td>A 32-bit signed value&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>INT64&lt;/td>
&lt;td>A 64-bit signed value&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DECIMAL&lt;/td>
&lt;td>An arbitrary-precision decimal type&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>FLOAT&lt;/td>
&lt;td>A 32-bit IEEE 754 floating point number&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DOUBLE&lt;/td>
&lt;td>A 64-bit IEEE 754 floating point number&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>STRING&lt;/td>
&lt;td>A string&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>DATETIME&lt;/td>
&lt;td>A timestamp represented as milliseconds since the epoch&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BOOLEAN&lt;/td>
&lt;td>A boolean value&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>BYTES&lt;/td>
&lt;td>A raw byte array&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>A field can also reference a nested schema. In this case, the field will have type ROW, and the nested schema will
be an attribute of this field type.&lt;/p>
&lt;p>Three collection types are supported as field types: ARRAY, ITERABLE and MAP:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>ARRAY&lt;/strong> This represents a repeated value type, where the repeated elements can have any supported type. Arrays of
nested rows are supported, as are arrays of arrays.&lt;/li>
&lt;li>&lt;strong>ITERABLE&lt;/strong> This is very similar to the array type, it represents a repeated value, but one in which the full list of
items is not known until iterated over. This is intended for the case where an iterable might be larger than the
available memory, and backed by external storage (for example, this can happen with the iterable returned by a
&lt;code>GroupByKey&lt;/code>). The repeated elements can have any supported type.&lt;/li>
&lt;li>&lt;strong>MAP&lt;/strong> This represents an associative map from keys to values. All schema types are supported for both keys and values.
Values that contain map types cannot be used as keys in any grouping operation.&lt;/li>
&lt;/ul>
&lt;h3 id="logical-types">6.4. Logical types&lt;/h3>
&lt;p>Users can extend the schema type system to add custom logical types that can be used as a field. A logical type is
identified by a unique identifier and an argument. A logical type also specifies an underlying schema type to be used
for storage, along with conversions to and from that type. As an example, a logical union can always be represented as
a row with nullable fields, where the user ensures that only one of those fields is ever set at a time. However this can
be tedious and complex to manage. The OneOf logical type provides a value class that makes it easier to manage the type
as a union, while still using a row with nullable fields as its underlying storage. Each logical type also has a
unique identifier, so they can be interpreted by other languages as well. More examples of logical types are listed
below.&lt;/p>
&lt;h4 id="defining-a-logical-type">6.4.1. Defining a logical type&lt;/h4>
&lt;p>To define a logical type you must specify a Schema type to be used to represent the underlying type as well as a unique
identifier for that type. A logical type imposes additional semantics on top a schema type. For example, a logical
type to represent nanosecond timestamps is represented as a schema containing an INT64 and an INT32 field. This schema
alone does not say anything about how to interpret this type, however the logical type tells you that this represents
a nanosecond timestamp, with the INT64 field representing seconds and the INT32 field representing nanoseconds.&lt;/p>
&lt;p>Logical types are also specified by an argument, which allows creating a class of related types. For example, a
limited-precision decimal type would have an integer argument indicating how many digits of precision are represented.
The argument is represented by a schema type, so can itself be a complex type.&lt;/p>
&lt;p class="language-java">In Java, a logical type is specified as a subclass of the &lt;code>LogicalType&lt;/code> class. A custom Java class can be specified to represent the logical type and conversion functions must be supplied to convert back and forth between this Java class and the underlying Schema type representation. For example, the logical type representing nanosecond timestamp might be implemented as follows&lt;/p>
&lt;p class="language-go">In Go, a logical type is specified with a custom implementation of the &lt;code>beam.SchemaProvider&lt;/code> interface.
For example, the logical type provider representing nanosecond timestamps
might be implemented as follows&lt;/p>
&lt;p class="language-typescript">In Typescript, a logical type defined by the &lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/coders/row_coder.ts">LogicalTypeInfo&lt;/a>
interface which associates a logical type&amp;rsquo;s URN with its representation
and its conversion to and from this representation.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// A Logical type using java.time.Instant to represent the logical type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TimestampNanos&lt;/span> &lt;span class="kd">implements&lt;/span> &lt;span class="n">LogicalType&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Instant&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The underlying schema used to represent rows.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Schema&lt;/span> &lt;span class="n">SCHEMA&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Schema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">builder&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">addInt64Field&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;seconds&amp;#34;&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">addInt32Field&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;nanos&amp;#34;&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getIdentifier&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="s">&amp;#34;timestampNanos&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">FieldType&lt;/span> &lt;span class="nf">getBaseType&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">schema&lt;/span>&lt;span class="o">;&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert the representation type to the underlying Row type. Called by Beam when necessary.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@Override&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">Row&lt;/span> &lt;span class="nf">toBaseType&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Instant&lt;/span> &lt;span class="n">instant&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">Row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">withSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">schema&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">addValues&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">instant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getEpochSecond&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">instant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getNano&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Convert the underlying Row type to an Instant. Called by Beam when necessary.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@Override&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="nf">toInputType&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Row&lt;/span> &lt;span class="n">base&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInt64&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;seconds&amp;#34;&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="n">row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInt32&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;nanos&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Define a logical provider like so:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// TimestampNanos is a logical type using time.Time, but
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// encodes as a schema type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">TimestampNanos&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span> &lt;span class="nx">TimestampNanos&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Seconds&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Time&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">Unix&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span> &lt;span class="nx">TimestampNanos&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Nanos&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">int32&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">int32&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Time&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">UnixNano&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">%&lt;/span> &lt;span class="mi">1000000000&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// tnStorage is the storage schema for TimestampNanos.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">tnStorage&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Seconds&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="s">`beam:&amp;#34;seconds&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Nanos&lt;/span> &lt;span class="kt">int32&lt;/span> &lt;span class="s">`beam:&amp;#34;nanos&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// reflect.Type of the Value type of TimestampNanos
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">tnType&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">TimestampNanos&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)).&lt;/span>&lt;span class="nf">Elem&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">tnStorageType&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">tnStorage&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)).&lt;/span>&lt;span class="nf">Elem&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// TimestampNanosProvider implements the beam.SchemaProvider interface.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">TimestampNanosProvider&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// FromLogicalType converts checks if the given type is TimestampNanos, and if so
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// returns the storage type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">TimestampNanosProvider&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">FromLogicalType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">rt&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="nx">tnType&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Errorf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;unable to provide schema.LogicalType for type %v, want %v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tnType&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">tnStorageType&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// BuildEncoder builds a Beam schema encoder for the TimestampNanos type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">TimestampNanosProvider&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">BuildEncoder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">any&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">io&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Writer&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">FromLogicalType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">enc&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">coder&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RowEncoderForStruct&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tnStorageType&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">iface&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">io&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Writer&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">v&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">iface&lt;/span>&lt;span class="p">.(&lt;/span>&lt;span class="nx">TimestampNanos&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nf">enc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tnStorage&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Seconds&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Seconds&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Nanos&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Nanos&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">w&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// BuildDecoder builds a Beam schema decoder for the TimestampNanos type.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">TimestampNanosProvider&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">BuildDecoder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Type&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">io&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Reader&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">any&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">FromLogicalType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">dec&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">coder&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RowDecoderForStruct&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tnStorageType&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="nx">io&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Reader&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">any&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">dec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">tn&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.(&lt;/span>&lt;span class="nx">tnStorage&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nf">TimestampNanos&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Unix&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Seconds&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">int64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Nanos&lt;/span>&lt;span class="p">))),&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Register it like so:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterSchemaProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tnType&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">TimestampNanosProvider&lt;/span>&lt;span class="p">{})&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Register a logical type:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">class&lt;/span> &lt;span class="nx">Foo&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kr">constructor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kr">public&lt;/span> &lt;span class="nx">value&lt;/span>: &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">requireForSerialization&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;apache-beam&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">Foo&lt;/span> &lt;span class="p">});&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">row_coder&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">registerLogicalType&lt;/span>&lt;span class="p">({&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">urn&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;beam:logical_type:typescript_foo:v1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">reprType&lt;/span>: &lt;span class="kt">row_coder.RowCoder.inferTypeFromJSON&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;string&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">toRepr&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">foo&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="nx">foo&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">value&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fromRepr&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">value&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">=&amp;gt;&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="nx">Foo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">value&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">});&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// And use it as follows:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="nx">Foo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;a&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="nx">Foo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;b&amp;#34;&lt;/span>&lt;span class="p">)]))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Use beamLogicalType in the exemplar to indicate its use.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">withRowCoder&lt;/span>&lt;span class="p">({&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beamLogicalType&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;beam:logical_type:typescript_foo:v1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="kr">as&lt;/span> &lt;span class="kt">any&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="built-in-logical-types">6.4.2. Useful logical types&lt;/h4>
&lt;p class="language-py">Currently the Python SDK provides minimal convenience logical types,
other than to handle &lt;code>MicrosInstant&lt;/code>.&lt;/p>
&lt;p class="language-go">Currently the Go SDK provides minimal convenience logical types,
other than to handle additional integer primitives, and &lt;code>time.Time&lt;/code>.&lt;/p>
&lt;h5 id="enumerationtype">&lt;strong>EnumerationType&lt;/strong>&lt;/h5>
&lt;p class="language-py">This convenience builder doesn&amp;rsquo;t yet exist for the Python SDK.&lt;/p>
&lt;p class="language-go">This convenience builder doesn&amp;rsquo;t yet exist for the Go SDK.&lt;/p>
&lt;p class="language-java">This logical type allows creating an enumeration type consisting of a set of named constants.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Schema&lt;/span> &lt;span class="n">schema&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Schema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">…&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">addLogicalTypeField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;color&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">EnumerationType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;RED&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;GREEN&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;BLUE&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">The value of this field is stored in the row as an INT32 type, however the logical type defines a value type that lets
you access the enumeration either as a string or a value. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">EnumerationType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Value&lt;/span> &lt;span class="n">enumValue&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">enumType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">valueOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;RED&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">enumValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns 0, the integer value of the constant.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">enumValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toString&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Returns &amp;#34;RED&amp;#34;, the string value of the constant
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Given a row object with an enumeration field, you can also extract the field as the enumeration value.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">EnumerationType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Value&lt;/span> &lt;span class="n">enumValue&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">row&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getLogicalTypeValue&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;color&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">EnumerationType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Value&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Automatic schema inference from Java POJOs and JavaBeans automatically converts Java enums to EnumerationType logical
types.&lt;/p>
&lt;h5 id="oneoftype">&lt;strong>OneOfType&lt;/strong>&lt;/h5>
&lt;p class="language-py">This convenience builder doesn&amp;rsquo;t yet exist for the Python SDK.&lt;/p>
&lt;p class="language-go">This convenience builder doesn&amp;rsquo;t yet exist for the Go SDK.&lt;/p>
&lt;p class="language-java">OneOfType allows creating a disjoint union type over a set of schema fields. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Schema&lt;/span> &lt;span class="n">schema&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Schema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">…&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">addLogicalTypeField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;oneOfField&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OneOfType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Field&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;intField&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">INT32&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Field&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;stringField&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">STRING&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Field&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;bytesField&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">BYTES&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">The value of this field is stored in the row as another Row type, where all the fields are marked as nullable. The
logical type however defines a Value object that contains an enumeration value indicating which field was set and allows
getting just that field:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Returns an enumeration indicating all possible case values for the enum.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// For the above example, this will be
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// EnumerationType.create(&amp;#34;intField&amp;#34;, &amp;#34;stringField&amp;#34;, &amp;#34;bytesField&amp;#34;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">EnumerationType&lt;/span> &lt;span class="n">oneOfEnum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">onOfType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getCaseEnumType&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Creates an instance of the union with the string field set.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">OneOfType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Value&lt;/span> &lt;span class="n">oneOfValue&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">oneOfType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createValue&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;stringField&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;foobar&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Handle the oneof
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">switch&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">oneOfValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getCaseEnumType&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">toString&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="s">&amp;#34;intField&amp;#34;&lt;/span>&lt;span class="o">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">processInt&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">oneOfValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="s">&amp;#34;stringField&amp;#34;&lt;/span>&lt;span class="o">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">processString&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">oneOfValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="s">&amp;#34;bytesField&amp;#34;&lt;/span>&lt;span class="o">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">processBytes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">oneOfValue&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">bytes&lt;/span>&lt;span class="o">[].&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">In the above example we used the field names in the switch statement for clarity, however the enum integer values could
also be used.&lt;/p>
&lt;h3 id="creating-schemas">6.5. Creating Schemas&lt;/h3>
&lt;p>In order to take advantage of schemas, your &lt;code>PCollection&lt;/code>s must have a schema attached to it.
Often, the source itself will attach a schema to the PCollection.
For example, when using &lt;code>AvroIO&lt;/code> to read Avro files, the source can automatically infer a Beam schema from the Avro schema and attach that to the Beam &lt;code>PCollection&lt;/code>.
However not all sources produce schemas.
In addition, often Beam pipelines have intermediate stages and types, and those also can benefit from the expressiveness of schemas.&lt;/p>
&lt;h4 id="inferring-schemas">6.5.1. Inferring schemas&lt;/h4>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;li data-value="typescript">TypeScript SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p class="language-typescript">Unfortunately, Beam is unable to access Typescript&amp;rsquo;s type information at runtime.
Schemas must be manually declared with &lt;code>beam.withRowCoder&lt;/code>.
On the other hand, schema-aware operations such as &lt;code>GroupBy&lt;/code> can be used
without an explicit schema declared.&lt;/p>
&lt;p class="language-java">Beam is able to infer schemas from a variety of common Java types.
The &lt;code>@DefaultSchema&lt;/code> annotation can be used to tell Beam to infer schemas from a specific type.
The annotation takes a &lt;code>SchemaProvider&lt;/code> as an argument, and &lt;code>SchemaProvider&lt;/code> classes are already built in for common Java types.
The &lt;code>SchemaRegistry&lt;/code> can also be invoked programmatically for cases where it is not practical to annotate the Java type itself.&lt;/p>
&lt;p class="language-java">&lt;strong>Java POJOs&lt;/strong>&lt;/p>
&lt;p class="language-java">A POJO (Plain Old Java Object) is a Java object that is not bound by any restriction other than the Java Language
Specification. A POJO can contain member variables that are primitives, that are other POJOs, or are collections maps or
arrays thereof. POJOs do not have to extend prespecified classes or extend any specific interfaces.&lt;/p>
&lt;p class="language-java">If a POJO class is annotated with &lt;code>@DefaultSchema(JavaFieldSchema.class)&lt;/code>, Beam will automatically infer a schema for
this class. Nested classes are supported as are classes with &lt;code>List&lt;/code>, array, and &lt;code>Map&lt;/code> fields.&lt;/p>
&lt;p class="language-java">For example, annotating the following class tells Beam to infer a schema from this POJO class and apply it to any
&lt;code>PCollection&amp;lt;TransactionPojo&amp;gt;&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaFieldSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TransactionPojo&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@SchemaCreate&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">TransactionPojo&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bank&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">purchaseAmount&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Beam will automatically infer the correct schema for this PCollection. No coder is needed as a result.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">TransactionPojo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pojos&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPojos&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">The &lt;code>@SchemaCreate&lt;/code> annotation tells Beam that this constructor can be used to create instances of TransactionPojo,
assuming that constructor parameters have the same names as the field names. &lt;code>@SchemaCreate&lt;/code> can also be used to annotate
static factory methods on the class, allowing the constructor to remain private. If there is no &lt;code>@SchemaCreate&lt;/code>
annotation then all the fields must be non-final and the class must have a zero-argument constructor.&lt;/p>
&lt;p class="language-java">There are a couple of other useful annotations that affect how Beam infers schemas. By default the schema field names
inferred will match that of the class field names. However &lt;code>@SchemaFieldName&lt;/code> can be used to specify a different name to
be used for the schema field. &lt;code>@SchemaIgnore&lt;/code> can be used to mark specific class fields as excluded from the inferred
schema. For example, it’s common to have ephemeral fields in a class that should not be included in a schema
(e.g. caching the hash value to prevent expensive recomputation of the hash), and &lt;code>@SchemaIgnore&lt;/code> can be used to
exclude these fields. Note that ignored fields will not be included in the encoding of these records.&lt;/p>
&lt;p class="language-java">In some cases it is not convenient to annotate the POJO class, for example if the POJO is in a different package that is
not owned by the Beam pipeline author. In these cases the schema inference can be triggered programmatically in
pipeline’s main function as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getSchemaRegistry&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">registerPOJO&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TransactionPOJO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;strong>Java Beans&lt;/strong>&lt;/p>
&lt;p class="language-java">Java Beans are a de-facto standard for creating reusable property classes in Java. While the full
standard has many characteristics, the key ones are that all properties are accessed via getter and setter classes, and
the name format for these getters and setters is standardized. A Java Bean class can be annotated with
&lt;code>@DefaultSchema(JavaBeanSchema.class)&lt;/code> and Beam will automatically infer a schema for this class. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaBeanSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TransactionBean&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">TransactionBean&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getBank&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setBank&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="nf">getPurchaseAmount&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setPurchaseAmount&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Beam will automatically infer the correct schema for this PCollection. No coder is needed as a result.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">TransactionBean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">beans&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readBeans&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">The &lt;code>@SchemaCreate&lt;/code> annotation can be used to specify a constructor or a static factory method, in which case the
setters and zero-argument constructor can be omitted.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaBeanSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TransactionBean&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@SchemaCreate&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Public&lt;/span> &lt;span class="nf">TransactionBean&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">bank&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="n">purchaseAmount&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getBank&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="nf">getPurchaseAmount&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span> &lt;span class="err">…&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;code>@SchemaFieldName&lt;/code> and &lt;code>@SchemaIgnore&lt;/code> can be used to alter the schema inferred, just like with POJO classes.&lt;/p>
&lt;p class="language-java">&lt;strong>AutoValue&lt;/strong>&lt;/p>
&lt;p class="language-java">Java value classes are notoriously difficult to generate correctly. There is a lot of boilerplate you must create in
order to properly implement a value class. AutoValue is a popular library for easily generating such classes by
implementing a simple abstract base class.&lt;/p>
&lt;p class="language-java">Beam can infer a schema from an AutoValue class. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultSchema&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AutoValueSchema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@AutoValue&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">abstract&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">TransactionValue&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">abstract&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="nf">getBank&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">abstract&lt;/span> &lt;span class="kt">double&lt;/span> &lt;span class="nf">getPurchaseAmount&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">This is all that’s needed to generate a simple AutoValue class, and the above &lt;code>@DefaultSchema&lt;/code> annotation tells Beam to
infer a schema from it. This also allows AutoValue elements to be used inside of &lt;code>PCollection&lt;/code>s.&lt;/p>
&lt;p class="language-java">&lt;code>@SchemaFieldName&lt;/code> and &lt;code>@SchemaIgnore&lt;/code> can be used to alter the schema inferred.&lt;/p>
&lt;p class="language-py">Beam has a few different mechanisms for inferring schemas from Python code.&lt;/p>
&lt;p class="language-py">&lt;strong>NamedTuple classes&lt;/strong>&lt;/p>
&lt;p class="language-py">A &lt;a href="https://docs.python.org/3/library/typing.html#typing.NamedTuple">NamedTuple&lt;/a>
class is a Python class that wraps a &lt;code>tuple&lt;/code>, assigning a name to each element
and restricting it to a particular type. Beam will automatically infer the
schema for PCollections with &lt;code>NamedTuple&lt;/code> output types. For example:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">Transaction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bank&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchase_amount&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">float&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">input&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_output_types&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">&lt;strong>beam.Row and Select&lt;/strong>&lt;/p>
&lt;p class="language-py">There are also methods for creating ad-hoc schema declarations. First, you can
use a lambda that returns instances of &lt;code>beam.Row&lt;/code>:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bank&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;bank&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchase_amount&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;purchase_amount&amp;#34;&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Sometimes it can be more concise to express the same logic with the
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Select">&lt;code>Select&lt;/code>&lt;/a> transform:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bank&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;bank&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchase_amount&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;purchase_amount&amp;#34;&lt;/span>&lt;span class="p">])&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Note that these declaration don&amp;rsquo;t include any specific information about the
types of the &lt;code>bank&lt;/code> and &lt;code>purchase_amount&lt;/code> fields, so Beam will attempt to infer
type information. If it&amp;rsquo;s unable to it will fall back to the generic type
&lt;code>Any&lt;/code>. Sometimes this is not ideal, you can use casts to make sure Beam
correctly infers types with &lt;code>beam.Row&lt;/code> or with &lt;code>Select&lt;/code>:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bank&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;bank&amp;#34;&lt;/span>&lt;span class="p">]),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchase_amount&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="nb">float&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;purchase_amount&amp;#34;&lt;/span>&lt;span class="p">])))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Beam currently only infers schemas for exported fields in Go structs.&lt;/p>
&lt;p class="language-go">&lt;strong>Structs&lt;/strong>&lt;/p>
&lt;p class="language-go">Beam will automatically infer schemas for all Go structs used
as PCollection elements, and default to encoding them using
schema encoding.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Transaction&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Bank&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">PurchaseAmount&lt;/span> &lt;span class="kt">float64&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">checksum&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">byte&lt;/span> &lt;span class="c1">// ignored
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Unexported fields are ignored, and cannot be automatically inferred as part of the schema.
Fields of type func, channel, unsafe.Pointer, or uintptr will be ignored by inference.
Fields of interface types are ignored, unless a schema provider
is registered for them.&lt;/p>
&lt;p class="language-go">By default, schema field names will match the exported struct field names.
In the above example, &amp;ldquo;Bank&amp;rdquo; and &amp;ldquo;PurchaseAmount&amp;rdquo; are the schema field names.
A schema field name can be overridden with a struct tag for the field.&lt;/p>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">Transaction&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Bank&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;bank&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">PurchaseAmount&lt;/span> &lt;span class="kt">float64&lt;/span> &lt;span class="s">`beam:&amp;#34;purchase_amount&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Overriding schema field names is useful for compatibility cross language transforms,
as schema fields may have different requirements or restrictions from Go exported fields.&lt;/p>
&lt;h3 id="using-schemas">6.6. Using Schema Transforms&lt;/h3>
&lt;p>A schema on a &lt;code>PCollection&lt;/code> enables a rich variety of relational transforms. The fact that each record is composed of
named fields allows for simple and readable aggregations that reference fields by name, similar to the aggregations in
a SQL expression.&lt;/p>
&lt;p class="language-go">Beam does not yet support Schema transforms natively in Go. However, it will be implemented with the following behavior.&lt;/p>
&lt;h4 id="661-field-selection-syntax">6.6.1. Field selection syntax&lt;/h4>
&lt;p>The advantage of schemas is that they allow referencing of element fields by name. Beam provides a selection syntax for
referencing fields, including nested and repeated fields. This syntax is used by all of the schema transforms when
referencing the fields they operate on. The syntax can also be used inside of a DoFn to specify which schema fields to
process.&lt;/p>
&lt;p>Addressing fields by name still retains type safety as Beam will check that schemas match at the time the pipeline graph
is constructed. If a field is specified that does not exist in the schema, the pipeline will fail to launch. In addition,
if a field is specified with a type that does not match the type of that field in the schema, the pipeline will fail to
launch.&lt;/p>
&lt;p>The following characters are not allowed in field names: . * [ ] { }&lt;/p>
&lt;h5 id="top-level-fields">&lt;strong>Top-level fields&lt;/strong>&lt;/h5>
&lt;p>In order to select a field at the top level of a schema, the name of the field is specified. For example, to select just
the user ids from a &lt;code>PCollection&lt;/code> of purchases one would write (using the &lt;code>Select&lt;/code> transform)&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;user_id&amp;#34;: ...,&amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;user_id&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="nested-fields">&lt;strong>Nested fields&lt;/strong>&lt;/h5>
&lt;p class="language-py">Support for Nested fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for Nested fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p class="language-java">Individual nested fields can be specified using the dot operator. For example, to select just the postal code from the
shipping address one would write&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.postCode&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;!--
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;user_id&amp;#34;: ..., &amp;#34;shipping_address&amp;#34;: &amp;#34;post_code&amp;#34;: ..., &amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">post_code&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">item&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">item&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;shipping_address.post_code&amp;#34;&lt;/span>&lt;span class="p">]))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
-->
&lt;h5 id="wildcards">&lt;strong>Wildcards&lt;/strong>&lt;/h5>
&lt;p class="language-py">Support for wildcards hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for wildcards hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p class="language-java">The * operator can be specified at any nesting level to represent all fields at that level. For example, to select all
shipping-address fields one would write&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.*&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;!--
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#TODO(https://github.com/apache/beam/issues/23275): Add support for projecting nested fields&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;user_id&amp;#34;: ..., &amp;#34;shipping_address&amp;#34;: &amp;#34;post_code&amp;#34;: ..., &amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;shipping_address.*&amp;#34;&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
-->
&lt;h5 id="arrays">&lt;strong>Arrays&lt;/strong>&lt;/h5>
&lt;p class="language-java">An array field, where the array element type is a row, can also have subfields of the element type addressed. When
selected, the result is an array of the selected subfield type. For example&lt;/p>
&lt;p class="language-py">Support for Array fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for Array fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;transactions[].bank&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Will result in a row containing an array field with element-type string, containing the list of banks for each
transaction.&lt;/p>
&lt;p class="language-java">While the use of [] brackets in the selector is recommended, to make it clear that array elements are being selected,
they can be omitted for brevity. In the future, array slicing will be supported, allowing selection of portions of the
array.&lt;/p>
&lt;h5 id="maps">&lt;strong>Maps&lt;/strong>&lt;/h5>
&lt;p>A map field, where the value type is a row, can also have subfields of the value type addressed. When selected, the
result is a map where the keys are the same as in the original map but the value is the specified type. Similar to
arrays, the use of {} curly brackets in the selector is recommended, to make it clear that map value elements are being
selected, they can be omitted for brevity. In the future, map key selectors will be supported, allowing selection of
specific keys from the map. For example, given the following schema:&lt;/p>
&lt;p>&lt;strong>PurchasesByType&lt;/strong>&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>purchases&lt;/td>
&lt;td>MAP{STRING, ROW{PURCHASE}&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>The following&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchasesByType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;purchases{}.userId&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Support for Map fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for Map fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p>Will result in a row containing a map field with key-type string and value-type string. The selected map will contain
all of the keys from the original map, and the values will be the userId contained in the purchase record.&lt;/p>
&lt;p>While the use of {} brackets in the selector is recommended, to make it clear that map value elements are being selected,
they can be omitted for brevity. In the future, map slicing will be supported, allowing selection of specific keys from
the map.&lt;/p>
&lt;h4 id="662-schema-transforms">6.6.2. Schema transforms&lt;/h4>
&lt;p>Beam provides a collection of transforms that operate natively on schemas. These transforms are very expressive,
allowing selections and aggregations in terms of named schema fields. Following are some examples of useful
schema transforms.&lt;/p>
&lt;h5 id="selecting-input">&lt;strong>Selecting input&lt;/strong>&lt;/h5>
&lt;p>Often a computation is only interested in a subset of the fields in an input &lt;code>PCollection&lt;/code>. The &lt;code>Select&lt;/code> transform allows
one to easily project out only the fields of interest. The resulting &lt;code>PCollection&lt;/code> has a schema containing each selected
field as a top-level field. Both top-level and nested fields can be selected. For example, in the Purchase schema, one
could select only the userId and streetAddress fields as follows&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;shippingAddress.streetAddress&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Support for Nested fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for Nested fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p>The resulting &lt;code>PCollection&lt;/code> will have the following schema&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>userId&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>streetAddress&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>The same is true for wildcard selections. The following&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;shippingAddress.*&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Support for Wildcards hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for Wildcards hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p>Will result in the following schema&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>userId&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>streetAddress&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>city&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>state&lt;/td>
&lt;td>nullable STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>country&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>postCode&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>When selecting fields nested inside of an array, the same rule applies that each selected field appears separately as a
top-level field in the resulting row. This means that if multiple fields are selected from the same nested row, each
selected field will appear as its own array field. For example&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fieldNames&lt;/span>&lt;span class="o">(&lt;/span> &lt;span class="s">&amp;#34;transactions.bank&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;transactions.purchaseAmount&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Support for nested fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for nested fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p class="language-java">&lt;p>Will result in the following schema&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>bank&lt;/td>
&lt;td>ARRAY[STRING]&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>purchaseAmount&lt;/td>
&lt;td>ARRAY[DOUBLE]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;/p>
&lt;p>Wildcard selections are equivalent to separately selecting each field.&lt;/p>
&lt;p>Selecting fields nested inside of maps have the same semantics as arrays. If you select multiple fields from a map
, then each selected field will be expanded to its own map at the top level. This means that the set of map keys will
be copied, once for each selected field.&lt;/p>
&lt;p>Sometimes different nested rows will have fields with the same name. Selecting multiple of these fields would result in
a name conflict, as all selected fields are put in the same row schema. When this situation arises, the
&lt;code>Select.withFieldNameAs&lt;/code> builder method can be used to provide an alternate name for the selected field.&lt;/p>
&lt;p>Another use of the Select transform is to flatten a nested schema into a single flat schema. For example&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Select&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">flattenedSchema&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Support for nested fields hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for nested fields hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p class="language-java">&lt;p>Will result in the following schema&lt;/p>
&lt;div class="table-container-wrapper">
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>userId&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>itemId&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress_streetAddress&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress_city&lt;/td>
&lt;td>nullable STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress_state&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress_country&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>shippingAddress_postCode&lt;/td>
&lt;td>STRING&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>costCents&lt;/td>
&lt;td>INT64&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>transactions_bank&lt;/td>
&lt;td>ARRAY[STRING]&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>transactions_purchaseAmount&lt;/td>
&lt;td>ARRAY[DOUBLE]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;/div>
&lt;/p>
&lt;h5 id="grouping-aggregations">&lt;strong>Grouping aggregations&lt;/strong>&lt;/h5>
&lt;p class="language-java">The &lt;code>Group&lt;/code> transform allows simply grouping data by any number of fields in the input schema, applying aggregations to
those groupings, and storing the result of those aggregations in a new schema field. The output of the &lt;code>Group&lt;/code> transform
has a schema with one field corresponding to each aggregation performed.&lt;/p>
&lt;p class="language-py">The &lt;code>GroupBy&lt;/code> transform allows simply grouping data by any number of fields in the input schema, applying aggregations to
those groupings, and storing the result of those aggregations in a new schema field. The output of the &lt;code>GroupBy&lt;/code> transform
has a schema with one field corresponding to each aggregation performed.&lt;/p>
&lt;p class="language-java">The simplest usage of &lt;code>Group&lt;/code> specifies no aggregations, in which case all inputs matching the provided set of fields
are grouped together into an &lt;code>ITERABLE&lt;/code> field. For example&lt;/p>
&lt;p class="language-py">The simplest usage of &lt;code>GroupBy&lt;/code> specifies no aggregations, in which case all inputs matching the provided set of fields
are grouped together into an &lt;code>ITERABLE&lt;/code> field. For example&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Group&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">byFieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;bank&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;user_id&amp;#34;: ...,&amp;#34;bank&amp;#34;: ..., &amp;#34;purchase_amount&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GroupBy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;user_id&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s1">&amp;#39;bank&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Support for schema-aware grouping hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p class="lanuage-java">The output schema of this is:&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>key&lt;/td>
&lt;td>ROW{userId:STRING, bank:STRING}&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>values&lt;/td>
&lt;td>ITERABLE[ROW[Purchase]]&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>The key field contains the grouping key and the values field contains a list of all the values that matched that key.&lt;/p>
&lt;p>The names of the key and values fields in the output schema can be controlled using this withKeyField and withValueField
builders, as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Group&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">byFieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;shippingAddress.streetAddress&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userAndStreet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;matchingPurchases&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>It is quite common to apply one or more aggregations to the grouped result. Each aggregation can specify one or more fields
to aggregate, an aggregation function, and the name of the resulting field in the output schema. For example, the
following application computes three aggregations grouped by userId, with all aggregations represented in a single
output schema:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Group&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">byFieldNames&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">aggregateField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;itemId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Count&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">combineFn&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="s">&amp;#34;numPurchases&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">aggregateField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;costCents&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ofLongs&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="s">&amp;#34;totalSpendCents&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">aggregateField&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;costCents&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Top&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">largestLongsFn&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">),&lt;/span> &lt;span class="s">&amp;#34;topPurchases&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="c1"># {&amp;#34;user_id&amp;#34;: ..., &amp;#34;item_Id&amp;#34;: ..., &amp;#34;cost_cents&amp;#34;: ...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">output_pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input_pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GroupBy&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;user_id&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">aggregate_field&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;item_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">CountCombineFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;num_purchases&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">aggregate_field&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;cost_cents&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;total_spendcents&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="n">aggregate_field&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;cost_cents&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TopCombineFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;top_purchases&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-go">Support for schema-aware grouping hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;p>The result of this aggregation will have the following schema:&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>key&lt;/td>
&lt;td>ROW{userId:STRING}&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>value&lt;/td>
&lt;td>ROW{numPurchases: INT64, totalSpendCents: INT64, topPurchases: ARRAY[INT64]}&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;p>Often &lt;code>Selected.flattenedSchema&lt;/code> will be use to flatten the result into a non-nested, flat schema.&lt;/p>
&lt;h5 id="joins">&lt;strong>Joins&lt;/strong>&lt;/h5>
&lt;p>Beam supports equijoins on schema &lt;code>PCollections&lt;/code> - namely joins where the join condition depends on the equality of a
subset of fields. For example, the following examples uses the Purchases schema to join transactions with the reviews
that are likely associated with that transaction (both the user and product match that in the transaction). This is a
&amp;ldquo;natural join&amp;rdquo; - one in which the same field names are used on both the left-hand and right-hand sides of the join -
and is specified with the &lt;code>using&lt;/code> keyword:&lt;/p>
&lt;p class="language-py">Support for joins hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for joins hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Transaction&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">transactions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readTransactions&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Review&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">reviews&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readReviews&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">joined&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">transactions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Join&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">innerJoin&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">reviews&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">using&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;productId&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">&lt;p>The resulting schema is the following:&lt;/p>
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>&lt;b>Field Name&lt;/b>&lt;/th>
&lt;th>&lt;b>Field Type&lt;/b>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>lhs&lt;/td>
&lt;td>ROW{Transaction}&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>rhs&lt;/td>
&lt;td>ROW{Review}&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;br/>
&lt;/p>
&lt;p>Each resulting row contains one Transaction and one Review that matched the join condition.&lt;/p>
&lt;p>If the fields to match in the two schemas have different names, then the on function can be used. For example, if the
Review schema named those fields differently than the Transaction schema, then we could write the following:&lt;/p>
&lt;p class="language-py">Support for joins hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for joins hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">joined&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">transactions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Join&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">innerJoin&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">reviews&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">on&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FieldsEqual&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">left&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;productId&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">right&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;reviewUserId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;reviewProductId&amp;#34;&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>In addition to inner joins, the Join transform supports full outer joins, left outer joins, and right outer joins.&lt;/p>
&lt;h5 id="complex-joins">&lt;strong>Complex joins&lt;/strong>&lt;/h5>
&lt;p>While most joins tend to be binary joins - joining two inputs together - sometimes you have more than two input
streams that all need to be joined on a common key. The &lt;code>CoGroup&lt;/code> transform allows joining multiple &lt;code>PCollections&lt;/code>
together based on equality of schema fields. Each &lt;code>PCollection&lt;/code> can be marked as required or optional in the final
join record, providing a generalization of outer joins to joins with greater than two input &lt;code>PCollection&lt;/code>s. The output
can optionally be expanded - providing individual joined records, as in the &lt;code>Join&lt;/code> transform. The output can also be
processed in unexpanded format - providing the join key along with Iterables of all records from each input that matched
that key.&lt;/p>
&lt;p class="language-py">Support for joins hasn&amp;rsquo;t been developed for the Python SDK yet.&lt;/p>
&lt;p class="language-go">Support for joins hasn&amp;rsquo;t been developed for the Go SDK yet.&lt;/p>
&lt;h5 id="filtering-events">&lt;strong>Filtering events&lt;/strong>&lt;/h5>
&lt;p>The &lt;code>Filter&lt;/code> transform can be configured with a set of predicates, each one based one specified fields. Only records for
which all predicates return true will pass the filter. For example the following&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">whereFieldName&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;costCents&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">c&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">c&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">100&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">20&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">whereFieldName&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.country&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">c&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">equals&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;de&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Will produce all purchases made from Germany with a purchase price of greater than twenty cents.&lt;/p>
&lt;h5 id="adding-fields-to-a-schema">&lt;strong>Adding fields to a schema&lt;/strong>&lt;/h5>
&lt;p>The AddFields transform can be used to extend a schema with new fields. Input rows will be extended to the new schema by
inserting null values for the new fields, though alternate default values can be specified; if the default null value
is used then the new field type will be marked as nullable. Nested subfields can be added using the field selection
syntax, including nested fields inside arrays or map values.&lt;/p>
&lt;p>For example, the following application&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AddFields&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">field&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timeOfDaySeconds&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">INT32&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">field&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.deliveryNotes&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">STRING&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">field&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;transactions.isFlagged&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">FieldType&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">BOOLEAN&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Results in a &lt;code>PCollection&lt;/code> with an expanded schema. All of the rows and fields of the input, but also with the specified
fields added to the schema. All resulting rows will have null values filled in for the &lt;strong>timeOfDaySeconds&lt;/strong> and the
&lt;strong>shippingAddress.deliveryNotes&lt;/strong> fields, and a false value filled in for the &lt;strong>transactions.isFlagged&lt;/strong> field.&lt;/p>
&lt;h5 id="removing-fields-from-a-schema">&lt;strong>Removing fields from a schema&lt;/strong>&lt;/h5>
&lt;p>&lt;code>DropFields&lt;/code> allows specific fields to be dropped from a schema. Input rows will have their schemas truncated, and any
values for dropped fields will be removed from the output. Nested fields can also be dropped using the field selection
syntax.&lt;/p>
&lt;p>For example, the following snippet&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">DropFields&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fields&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;shippingAddress.streetAddress&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Results in a copy of the input with those two fields and their corresponding values removed.&lt;/p>
&lt;h5 id="renaming-schema-fields">&lt;strong>Renaming schema fields&lt;/strong>&lt;/h5>
&lt;p>&lt;code>RenameFields&lt;/code> allows specific fields in a schema to be renamed. The field values in input rows are left unchanged, only
the schema is modified. This transform is often used to prepare records for output to a schema-aware sink, such as an
RDBMS, to make sure that the &lt;code>PCollection&lt;/code> schema field names match that of the output. It can also be used to rename
fields generated by other transforms to make them more usable (similar to SELECT AS in SQL). Nested fields can also be
renamed using the field-selection syntax.&lt;/p>
&lt;p>For example, the following snippet&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">RenameFields&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">rename&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;userIdentifier&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">rename&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.streetAddress&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;shippingAddress.street&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Results in the same set of unmodified input elements, however the schema on the PCollection has been changed to rename
&lt;strong>userId&lt;/strong> to &lt;strong>userIdentifier&lt;/strong> and &lt;strong>shippingAddress.streetAddress&lt;/strong> to &lt;strong>shippingAddress.street&lt;/strong>.&lt;/p>
&lt;h5 id="converting-between-types">&lt;strong>Converting between types&lt;/strong>&lt;/h5>
&lt;p>As mentioned, Beam can automatically convert between different Java types, as long as those types have equivalent
schemas. One way to do this is by using the &lt;code>Convert&lt;/code> transform, as follows.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchaseBean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">purchaseBeans&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPurchasesAsBeans&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pojoPurchases&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">purchaseBeans&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Convert&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Beam will validate that the inferred schema for &lt;code>PurchasePojo&lt;/code> matches that of the input &lt;code>PCollection&lt;/code>, and will
then cast to a &lt;code>PCollection&amp;lt;PurchasePojo&amp;gt;&lt;/code>.&lt;/p>
&lt;p>Since the &lt;code>Row&lt;/code> class can support any schema, any &lt;code>PCollection&lt;/code> with schema can be cast to a &lt;code>PCollection&lt;/code> of rows, as
follows.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">purchaseRows&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">purchaseBeans&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Convert&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">toRows&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>If the source type is a single-field schema, Convert will also convert to the type of the field if asked, effectively
unboxing the row. For example, give a schema with a single INT64 field, the following will convert it to a
&lt;code>PCollection&amp;lt;Long&amp;gt;&lt;/code>&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">longs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">rows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Convert&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">longs&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>In all cases, type checking is done at pipeline graph construction, and if the types do not match the schema then the
pipeline will fail to launch.&lt;/p>
&lt;h4 id="663-schemas-in-pardo">6.6.3. Schemas in ParDo&lt;/h4>
&lt;p>A &lt;code>PCollection&lt;/code> with a schema can apply a &lt;code>ParDo&lt;/code>, just like any other &lt;code>PCollection&lt;/code>. However the Beam runner is aware
of schemas when applying a &lt;code>ParDo&lt;/code>, which enables additional functionality.&lt;/p>
&lt;h5 id="input-conversion">&lt;strong>Input conversion&lt;/strong>&lt;/h5>
&lt;p class="language-go">Beam does not yet support input conversion in Go.&lt;/p>
&lt;p>Since Beam knows the schema of the source &lt;code>PCollection&lt;/code>, it can automatically convert the elements to any Java type for
which a matching schema is known. For example, using the above-mentioned Transaction schema, say we have the following
&lt;code>PCollection&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">purchases&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPurchases&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>If there were no schema, then the applied &lt;code>DoFn&lt;/code> would have to accept an element of type &lt;code>TransactionPojo&lt;/code>. However
since there is a schema, you could apply the following DoFn:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">PurchaseBean&lt;/span> &lt;span class="n">purchase&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Even though the &lt;code>@Element&lt;/code> parameter does not match the Java type of the &lt;code>PCollection&lt;/code>, since it has a matching schema
Beam will automatically convert elements. If the schema does not match, Beam will detect this at graph-construction time
and will fail the job with a type error.&lt;/p>
&lt;p>Since every schema can be represented by a Row type, Row can also be used here:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">appy&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">Row&lt;/span> &lt;span class="n">purchase&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h5 id="input-selection">&lt;strong>Input selection&lt;/strong>&lt;/h5>
&lt;p>Since the input has a schema, you can also automatically select specific fields to process in the DoFn.&lt;/p>
&lt;p>Given the above purchases &lt;code>PCollection&lt;/code>, say you want to process just the userId and the itemId fields. You can do these
using the above-described selection expressions, as follows:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">appy&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@FieldAccess&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;userId&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">userId&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@FieldAccess&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;itemId&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">itemId&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>You can also select nested fields, as follows.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">purchases&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">appy&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PurchasePojo&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@FieldAccess&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;shippingAddress.street&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">street&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>For more information, see the section on field-selection expressions. When selecting subschemas, Beam will
automatically convert to any matching schema type, just like when reading the entire row.&lt;/p>
&lt;h2 id="data-encoding-and-type-safety">7. Data encoding and type safety&lt;/h2>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;li data-value="typescript">TypeScript SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>When Beam runners execute your pipeline, they often need to materialize the
intermediate data in your &lt;code>PCollection&lt;/code>s, which requires converting elements to
and from byte strings. The Beam SDKs use objects called &lt;code>Coder&lt;/code>s to describe how
the elements of a given &lt;code>PCollection&lt;/code> may be encoded and decoded.&lt;/p>
&lt;blockquote>
&lt;p>Note that coders are unrelated to parsing or formatting data when interacting
with external data sources or sinks. Such parsing or formatting should
typically be done explicitly, using transforms such as &lt;code>ParDo&lt;/code> or
&lt;code>MapElements&lt;/code>.&lt;/p>
&lt;/blockquote>
&lt;p class="language-java">In the Beam SDK for Java, the type &lt;code>Coder&lt;/code> provides the methods required for
encoding and decoding data. The SDK for Java provides a number of Coder
subclasses that work with a variety of standard Java types, such as Integer,
Long, Double, StringUtf8 and more. You can find all of the available Coder
subclasses in the &lt;a href="https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders">Coder package&lt;/a>.&lt;/p>
&lt;p class="language-py">In the Beam SDK for Python, the type &lt;code>Coder&lt;/code> provides the methods required for
encoding and decoding data. The SDK for Python provides a number of Coder
subclasses that work with a variety of standard Python types, such as primitive
types, Tuple, Iterable, StringUtf8 and more. You can find all of the available
Coder subclasses in the
&lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/coders">apache_beam.coders&lt;/a>
package.&lt;/p>
&lt;p class="language-go">Standard Go types like &lt;code>int&lt;/code>, &lt;code>int64&lt;/code> &lt;code>float64&lt;/code>, &lt;code>[]byte&lt;/code>, and &lt;code>string&lt;/code> and more are coded using builtin coders.
Structs and pointers to structs default using Beam Schema Row encoding.
However, users can build and register custom coders with &lt;code>beam.RegisterCoder&lt;/code>.
You can find available Coder functions in the
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder">coder&lt;/a>
package.&lt;/p>
&lt;p class="language-typescript">Standard Typescript types like &lt;code>number&lt;/code>, &lt;code>UInt8Array&lt;/code> and &lt;code>string&lt;/code> and more are coded using builtin coders.
Json objects and arrays are encoded via a BSON encoding.
For these types, coders need not be specified unless interacting with cross-language transforms.
Users can build custom coders by extending &lt;code>beam.coders.Coder&lt;/code>
for use with &lt;code>withCoderInternal&lt;/code>, but generally logical types are preferred for this case.&lt;/p>
&lt;blockquote>
&lt;p>Note that coders do not necessarily have a 1:1 relationship with types. For
example, the Integer type can have multiple valid coders, and input and output
data can use different Integer coders. A transform might have Integer-typed
input data that uses BigEndianIntegerCoder, and Integer-typed output data that
uses VarIntCoder.&lt;/p>
&lt;/blockquote>
&lt;h3 id="specifying-coders">7.1. Specifying coders&lt;/h3>
&lt;p>The Beam SDKs require a coder for every &lt;code>PCollection&lt;/code> in your pipeline. In most
cases, the Beam SDK is able to automatically infer a &lt;code>Coder&lt;/code> for a &lt;code>PCollection&lt;/code>
based on its element type or the transform that produces it, however, in some
cases the pipeline author will need to specify a &lt;code>Coder&lt;/code> explicitly, or develop
a &lt;code>Coder&lt;/code> for their custom type.&lt;/p>
&lt;p class="language-java">You can explicitly set the coder for an existing &lt;code>PCollection&lt;/code> by using the
method &lt;code>PCollection.setCoder&lt;/code>. Note that you cannot call &lt;code>setCoder&lt;/code> on a
&lt;code>PCollection&lt;/code> that has been finalized (e.g. by calling &lt;code>.apply&lt;/code> on it).&lt;/p>
&lt;p class="language-java">You can get the coder for an existing &lt;code>PCollection&lt;/code> by using the method
&lt;code>getCoder&lt;/code>. This method will fail with an &lt;code>IllegalStateException&lt;/code> if a coder has
not been set and cannot be inferred for the given &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>Beam SDKs use a variety of mechanisms when attempting to automatically infer the
&lt;code>Coder&lt;/code> for a &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p class="language-java">Each pipeline object has a &lt;code>CoderRegistry&lt;/code>. The &lt;code>CoderRegistry&lt;/code> represents a
mapping of Java types to the default coders that the pipeline should use for
&lt;code>PCollection&lt;/code>s of each type.&lt;/p>
&lt;p class="language-py">The Beam SDK for Python has a &lt;code>CoderRegistry&lt;/code> that represents a mapping of
Python types to the default coder that should be used for &lt;code>PCollection&lt;/code>s of each
type.&lt;/p>
&lt;p class="language-go">The Beam SDK for Go allows users to register default coder
implementations with &lt;code>beam.RegisterCoder&lt;/code>.&lt;/p>
&lt;p class="language-java">By default, the Beam SDK for Java automatically infers the &lt;code>Coder&lt;/code> for the
elements of a &lt;code>PCollection&lt;/code> produced by a &lt;code>PTransform&lt;/code> using the type parameter
from the transform&amp;rsquo;s function object, such as &lt;code>DoFn&lt;/code>. In the case of &lt;code>ParDo&lt;/code>,
for example, a &lt;code>DoFn&amp;lt;Integer, String&amp;gt;&lt;/code> function object accepts an input element
of type &lt;code>Integer&lt;/code> and produces an output element of type &lt;code>String&lt;/code>. In such a
case, the SDK for Java will automatically infer the default &lt;code>Coder&lt;/code> for the
output &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code> (in the default pipeline &lt;code>CoderRegistry&lt;/code>, this is
&lt;code>StringUtf8Coder&lt;/code>).&lt;/p>
&lt;p class="language-py">By default, the Beam SDK for Python automatically infers the &lt;code>Coder&lt;/code> for the
elements of an output &lt;code>PCollection&lt;/code> using the typehints from the transform&amp;rsquo;s
function object, such as &lt;code>DoFn&lt;/code>. In the case of &lt;code>ParDo&lt;/code>, for example a &lt;code>DoFn&lt;/code>
with the typehints &lt;code>@beam.typehints.with_input_types(int)&lt;/code> and
&lt;code>@beam.typehints.with_output_types(str)&lt;/code> accepts an input element of type int
and produces an output element of type str. In such a case, the Beam SDK for
Python will automatically infer the default &lt;code>Coder&lt;/code> for the output &lt;code>PCollection&lt;/code>
(in the default pipeline &lt;code>CoderRegistry&lt;/code>, this is &lt;code>BytesCoder&lt;/code>).&lt;/p>
&lt;p class="language-go">By default, the Beam SDK for Go automatically infers the &lt;code>Coder&lt;/code> for the elements of an output &lt;code>PCollection&lt;/code> by the output of the transform&amp;rsquo;s function object, such as a &lt;code>DoFn&lt;/code>.
In the case of &lt;code>ParDo&lt;/code>, for example a &lt;code>DoFn&lt;/code>
with the parameters of &lt;code>v int, emit func(string)&lt;/code> accepts an input element of type &lt;code>int&lt;/code>
and produces an output element of type &lt;code>string&lt;/code>.
In such a case, the Beam SDK for Go will automatically infer the default &lt;code>Coder&lt;/code> for the output &lt;code>PCollection&lt;/code> to be the &lt;code>string_utf8&lt;/code> coder.&lt;/p>
&lt;span class="language-java">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> If you create your &lt;code>PCollection&lt;/code> from in-memory data by using the
&lt;code>Create&lt;/code> transform, you cannot rely on coder inference and default coders.
&lt;code>Create&lt;/code> does not have access to any typing information for its arguments, and
may not be able to infer a coder if the argument list contains a value whose
exact run-time class doesn&amp;rsquo;t have a default coder registered.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;p class="language-java">When using &lt;code>Create&lt;/code>, the simplest way to ensure that you have the correct coder
is by invoking &lt;code>withCoder&lt;/code> when you apply the &lt;code>Create&lt;/code> transform.&lt;/p>
&lt;h3 id="default-coders-and-the-coderregistry">7.2. Default coders and the CoderRegistry&lt;/h3>
&lt;p>Each Pipeline object has a &lt;code>CoderRegistry&lt;/code> object, which maps language types to
the default coder the pipeline should use for those types. You can use the
&lt;code>CoderRegistry&lt;/code> yourself to look up the default coder for a given type, or to
register a new default coder for a given type.&lt;/p>
&lt;p>&lt;code>CoderRegistry&lt;/code> contains a default mapping of coders to standard
&lt;span class="language-java">Java&lt;/span>&lt;span class="language-py">Python&lt;/span>
types for any pipeline you create using the Beam SDK for
&lt;span class="language-java">Java&lt;/span>&lt;span class="language-py">Python&lt;/span>.
The following table shows the standard mapping:&lt;/p>
&lt;p class="language-java">&lt;div class="table-container-wrapper">
&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>Java Type&lt;/th>
&lt;th>Default Coder&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>Double&lt;/td>
&lt;td>DoubleCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Instant&lt;/td>
&lt;td>InstantCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Integer&lt;/td>
&lt;td>VarIntCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Iterable&lt;/td>
&lt;td>IterableCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>KV&lt;/td>
&lt;td>KvCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>List&lt;/td>
&lt;td>ListCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Map&lt;/td>
&lt;td>MapCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>Long&lt;/td>
&lt;td>VarLongCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>String&lt;/td>
&lt;td>StringUtf8Coder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>TableRow&lt;/td>
&lt;td>TableRowJsonCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Void&lt;/td>
&lt;td>VoidCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>byte[ ]&lt;/td>
&lt;td>ByteArrayCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>TimestampedValue&lt;/td>
&lt;td>TimestampedValueCoder&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;/p>
&lt;p class="language-py">&lt;table class="table-wrapper--pr">
&lt;thead>
&lt;tr class="header">
&lt;th>Python Type&lt;/th>
&lt;th>Default Coder&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr class="odd">
&lt;td>int&lt;/td>
&lt;td>VarIntCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>float&lt;/td>
&lt;td>FloatCoder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>str&lt;/td>
&lt;td>BytesCoder&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td>bytes&lt;/td>
&lt;td>StrUtf8Coder&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td>Tuple&lt;/td>
&lt;td>TupleCoder&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/p>
&lt;h4 id="default-coder-lookup">7.2.1. Looking up a default coder&lt;/h4>
&lt;p class="language-java">You can use the method &lt;code>CoderRegistry.getCoder&lt;/code> to determine the default
Coder for a Java type. You can access the &lt;code>CoderRegistry&lt;/code> for a given pipeline
by using the method &lt;code>Pipeline.getCoderRegistry&lt;/code>. This allows you to determine
(or set) the default Coder for a Java type on a per-pipeline basis: i.e. &amp;ldquo;for
this pipeline, verify that Integer values are encoded using
&lt;code>BigEndianIntegerCoder&lt;/code>.&amp;rdquo;&lt;/p>
&lt;p class="language-py">You can use the method &lt;code>CoderRegistry.get_coder&lt;/code> to determine the default Coder
for a Python type. You can use &lt;code>coders.registry&lt;/code> to access the &lt;code>CoderRegistry&lt;/code>.
This allows you to determine (or set) the default Coder for a Python type.&lt;/p>
&lt;p class="language-go">You can use the &lt;code>beam.NewCoder&lt;/code> function to determine the default Coder for a Go type.&lt;/p>
&lt;h4 id="setting-default-coder">7.2.2. Setting the default coder for a type&lt;/h4>
&lt;p class="language-java language-py">To set the default Coder for a
&lt;span class="language-java">Java&lt;/span>&lt;span class="language-py">Python&lt;/span>
type for a particular pipeline, you obtain and modify the pipeline&amp;rsquo;s
&lt;code>CoderRegistry&lt;/code>. You use the method
&lt;span class="language-java">&lt;code>Pipeline.getCoderRegistry&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>coders.registry&lt;/code>&lt;/span>
to get the &lt;code>CoderRegistry&lt;/code> object, and then use the method
&lt;span class="language-java">&lt;code>CoderRegistry.registerCoder&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>CoderRegistry.register_coder&lt;/code>&lt;/span>
to register a new &lt;code>Coder&lt;/code> for the target type.&lt;/p>
&lt;p class="language-go">To set the default Coder for a Go type you use the function &lt;code>beam.RegisterCoder&lt;/code> to register a encoder and decoder functions for the target type.
However, built in types like &lt;code>int&lt;/code>, &lt;code>string&lt;/code>, &lt;code>float64&lt;/code>, etc cannot have their coders override.&lt;/p>
&lt;p class="language-java language-py">The following example code demonstrates how to set a default Coder, in this case
&lt;code>BigEndianIntegerCoder&lt;/code>, for
&lt;span class="language-java">Integer&lt;/span>&lt;span class="language-py">int&lt;/span>
values for a pipeline.&lt;/p>
&lt;p class="language-go">The following example code demonstrates how to set a custom Coder for &lt;code>MyCustomType&lt;/code> elements.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CoderRegistry&lt;/span> &lt;span class="n">cr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getCoderRegistry&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">cr&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">registerCoder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">BigEndianIntegerCoder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">apache_beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">registry&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">register_coder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">BigEndianIntegerCoder&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">MyCustomType&lt;/span> &lt;span class="kd">struct&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// See documentation on beam.RegisterCoder for other supported coder forms.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">encode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">byte&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">b&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">byte&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">MyCustomType&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="o">...&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterCoder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)).&lt;/span>&lt;span class="nf">Elem&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">encode&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">decode&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="annotating-custom-type-default-coder">7.2.3. Annotating a custom data type with a default coder&lt;/h4>
&lt;span class="language-java">
&lt;p>If your pipeline program defines a custom data type, you can use the
&lt;code>@DefaultCoder&lt;/code> annotation to specify the coder to use with that type.
By default, Beam will use &lt;code>SerializableCoder&lt;/code> which uses Java serialization,
but it has drawbacks:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>It is inefficient in encoding size and speed.
See this &lt;a href="https://blog.softwaremill.com/the-best-serialization-strategy-for-event-sourcing-9321c299632b">comparison of Java serialization methods.&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It is non-deterministic: it may produce different binary encodings for two
equivalent objects.&lt;/p>
&lt;p>For key/value pairs, the correctness of key-based operations
(GroupByKey, Combine) and per-key State depends on having a deterministic
coder for the key&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>You can use the &lt;code>@DefaultCoder&lt;/code> annotation to set a new default as follows:&lt;/p>
&lt;/span>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultCoder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AvroCoder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyCustomDataType&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">If you&amp;rsquo;ve created a custom coder to match your data type, and you want to use
the &lt;code>@DefaultCoder&lt;/code> annotation, your coder class must implement a static
&lt;code>Coder.of(Class&amp;lt;T&amp;gt;)&lt;/code> factory method.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyCustomCoder&lt;/span> &lt;span class="kd">implements&lt;/span> &lt;span class="n">Coder&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="n">Coder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">T&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Class&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">T&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">clazz&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{...}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@DefaultCoder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyCustomCoder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyCustomDataType&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py language-go">The Beam SDK for &lt;span class="language-py">Python&lt;/span>&lt;span class="language-go">Go&lt;/span>
does not support annotating data types with a default coder.
If you would like to set a default coder, use the method described in the
previous section, &lt;em>Setting the default coder for a type&lt;/em>.&lt;/p>
&lt;h2 id="windowing">8. Windowing&lt;/h2>
&lt;p>Windowing subdivides a &lt;code>PCollection&lt;/code> according to the timestamps of its
individual elements. Transforms that aggregate multiple elements, such as
&lt;code>GroupByKey&lt;/code> and &lt;code>Combine&lt;/code>, work implicitly on a per-window basis — they process
each &lt;code>PCollection&lt;/code> as a succession of multiple, finite windows, though the
entire collection itself may be of unbounded size.&lt;/p>
&lt;p>A related concept, called &lt;strong>triggers&lt;/strong>, determines when to emit the results of
aggregation as unbounded data arrives. You can use triggers to refine the
windowing strategy for your &lt;code>PCollection&lt;/code>. Triggers allow you to deal with
late-arriving data or to provide early results. See the &lt;a href="#triggers">triggers&lt;/a>
section for more information.&lt;/p>
&lt;h3 id="windowing-basics">8.1. Windowing basics&lt;/h3>
&lt;p>Some Beam transforms, such as &lt;code>GroupByKey&lt;/code> and &lt;code>Combine&lt;/code>, group multiple
elements by a common key. Ordinarily, that grouping operation groups all of the
elements that have the same key within the entire data set. With an unbounded
data set, it is impossible to collect all of the elements, since new elements
are constantly being added and may be infinitely many (e.g. streaming data). If
you are working with unbounded &lt;code>PCollection&lt;/code>s, windowing is especially useful.&lt;/p>
&lt;p>In the Beam model, any &lt;code>PCollection&lt;/code> (including unbounded &lt;code>PCollection&lt;/code>s) can be
subdivided into logical windows. Each element in a &lt;code>PCollection&lt;/code> is assigned to
one or more windows according to the &lt;code>PCollection&lt;/code>&amp;rsquo;s windowing function, and
each individual window contains a finite number of elements. Grouping transforms
then consider each &lt;code>PCollection&lt;/code>&amp;rsquo;s elements on a per-window basis. &lt;code>GroupByKey&lt;/code>,
for example, implicitly groups the elements of a &lt;code>PCollection&lt;/code> by &lt;em>key and
window&lt;/em>.&lt;/p>
&lt;p>&lt;strong>Caution:&lt;/strong> Beam&amp;rsquo;s default windowing behavior is to assign all elements of a
&lt;code>PCollection&lt;/code> to a single, global window and discard late data, &lt;em>even for
unbounded &lt;code>PCollection&lt;/code>s&lt;/em>. Before you use a grouping transform such as
&lt;code>GroupByKey&lt;/code> on an unbounded &lt;code>PCollection&lt;/code>, you must do at least one of the
following:&lt;/p>
&lt;ul>
&lt;li>Set a non-global windowing function. See &lt;a href="#setting-your-pcollections-windowing-function">Setting your PCollection&amp;rsquo;s
windowing function&lt;/a>.&lt;/li>
&lt;li>Set a non-default &lt;a href="#triggers">trigger&lt;/a>. This allows the global window to emit
results under other conditions, since the default windowing behavior (waiting
for all data to arrive) will never occur.&lt;/li>
&lt;/ul>
&lt;p>If you don&amp;rsquo;t set a non-global windowing function or a non-default trigger for
your unbounded &lt;code>PCollection&lt;/code> and subsequently use a grouping transform such as
&lt;code>GroupByKey&lt;/code> or &lt;code>Combine&lt;/code>, your pipeline will generate an error upon
construction and your job will fail.&lt;/p>
&lt;h4 id="windowing-constraints">8.1.1. Windowing constraints&lt;/h4>
&lt;p>After you set the windowing function for a &lt;code>PCollection&lt;/code>, the elements&amp;rsquo; windows
are used the next time you apply a grouping transform to that &lt;code>PCollection&lt;/code>.
Window grouping occurs on an as-needed basis. If you set a windowing function
using the &lt;code>Window&lt;/code> transform, each element is assigned to a window, but the
windows are not considered until &lt;code>GroupByKey&lt;/code> or &lt;code>Combine&lt;/code> aggregates across a
window and key. This can have different effects on your pipeline. Consider the
example pipeline in the figure below:&lt;/p>
&lt;p>&lt;img src="/images/windowing-pipeline-unbounded.svg" alt="Diagram of pipeline applying windowing">&lt;/p>
&lt;p>&lt;strong>Figure 3:&lt;/strong> Pipeline applying windowing&lt;/p>
&lt;p>In the above pipeline, we create an unbounded &lt;code>PCollection&lt;/code> by reading a set of
key/value pairs using &lt;code>KafkaIO&lt;/code>, and then apply a windowing function to that
collection using the &lt;code>Window&lt;/code> transform. We then apply a &lt;code>ParDo&lt;/code> to the
collection, and then later group the result of that &lt;code>ParDo&lt;/code> using &lt;code>GroupByKey&lt;/code>.
The windowing function has no effect on the &lt;code>ParDo&lt;/code> transform, because the
windows are not actually used until they&amp;rsquo;re needed for the &lt;code>GroupByKey&lt;/code>.
Subsequent transforms, however, are applied to the result of the &lt;code>GroupByKey&lt;/code> &amp;ndash;
data is grouped by both key and window.&lt;/p>
&lt;h4 id="windowing-bounded-collections">8.1.2. Windowing with bounded PCollections&lt;/h4>
&lt;p>You can use windowing with fixed-size data sets in &lt;strong>bounded&lt;/strong> &lt;code>PCollection&lt;/code>s.
However, note that windowing considers only the implicit timestamps attached to
each element of a &lt;code>PCollection&lt;/code>, and data sources that create fixed data sets
(such as &lt;code>TextIO&lt;/code>) assign the same timestamp to every element. This means that
all the elements are by default part of a single, global window.&lt;/p>
&lt;p>To use windowing with fixed data sets, you can assign your own timestamps to
each element. To assign timestamps to elements, use a &lt;code>ParDo&lt;/code> transform with a
&lt;code>DoFn&lt;/code> that outputs each element with a new timestamp (for example, the
&lt;a href="https://beam.apache.org/releases/javadoc/2.56.0/index.html?org/apache/beam/sdk/transforms/WithTimestamps.html">WithTimestamps&lt;/a>
transform in the Beam SDK for Java).&lt;/p>
&lt;p>To illustrate how windowing with a bounded &lt;code>PCollection&lt;/code> can affect how your
pipeline processes data, consider the following pipeline:&lt;/p>
&lt;p>&lt;img src="/images/unwindowed-pipeline-bounded.svg" alt="Diagram of GroupByKey and ParDo without windowing, on a bounded collection">&lt;/p>
&lt;p>&lt;strong>Figure 4:&lt;/strong> &lt;code>GroupByKey&lt;/code> and &lt;code>ParDo&lt;/code> without windowing, on a bounded collection.&lt;/p>
&lt;p>In the above pipeline, we create a bounded &lt;code>PCollection&lt;/code> by reading lines from a
file using &lt;code>TextIO&lt;/code>. We then group the collection using &lt;code>GroupByKey&lt;/code>,
and apply a &lt;code>ParDo&lt;/code> transform to the grouped &lt;code>PCollection&lt;/code>. In this example, the
&lt;code>GroupByKey&lt;/code> creates a collection of unique keys, and then &lt;code>ParDo&lt;/code> gets applied
exactly once per key.&lt;/p>
&lt;p>Note that even if you don’t set a windowing function, there is still a window &amp;ndash;
all elements in your &lt;code>PCollection&lt;/code> are assigned to a single global window.&lt;/p>
&lt;p>Now, consider the same pipeline, but using a windowing function:&lt;/p>
&lt;p>&lt;img src="/images/windowing-pipeline-bounded.svg" alt="Diagram of GroupByKey and ParDo with windowing, on a bounded collection">&lt;/p>
&lt;p>&lt;strong>Figure 5:&lt;/strong> &lt;code>GroupByKey&lt;/code> and &lt;code>ParDo&lt;/code> with windowing, on a bounded collection.&lt;/p>
&lt;p>As before, the pipeline creates a bounded &lt;code>PCollection&lt;/code> by reading lines from a
file. We then set a &lt;a href="#setting-your-pcollections-windowing-function">windowing function&lt;/a>
for that &lt;code>PCollection&lt;/code>. The &lt;code>GroupByKey&lt;/code> transform groups the elements of the
&lt;code>PCollection&lt;/code> by both key and window, based on the windowing function. The
subsequent &lt;code>ParDo&lt;/code> transform gets applied multiple times per key, once for each
window.&lt;/p>
&lt;h3 id="provided-windowing-functions">8.2. Provided windowing functions&lt;/h3>
&lt;p>You can define different kinds of windows to divide the elements of your
&lt;code>PCollection&lt;/code>. Beam provides several windowing functions, including:&lt;/p>
&lt;ul>
&lt;li>Fixed Time Windows&lt;/li>
&lt;li>Sliding Time Windows&lt;/li>
&lt;li>Per-Session Windows&lt;/li>
&lt;li>Single Global Window&lt;/li>
&lt;li>Calendar-based Windows (not supported by the Beam SDK for Python or Go)&lt;/li>
&lt;/ul>
&lt;p>You can also define your own &lt;code>WindowFn&lt;/code> if you have a more complex need.&lt;/p>
&lt;p>Note that each element can logically belong to more than one window, depending
on the windowing function you use. Sliding time windowing, for example, can
create overlapping windows wherein a single element can be assigned to multiple
windows. However, each element in a &lt;code>PCollection&lt;/code> can only be in one window, so
if an element is assigned to multiple windows, the element is conceptually
duplicated into each of the windows and each element is identical except for its
window.&lt;/p>
&lt;h4 id="fixed-time-windows">8.2.1. Fixed time windows&lt;/h4>
&lt;p>The simplest form of windowing is using &lt;strong>fixed time windows&lt;/strong>: given a
timestamped &lt;code>PCollection&lt;/code> which might be continuously updating, each window
might capture (for example) all elements with timestamps that fall into a 30
second interval.&lt;/p>
&lt;p>A fixed time window represents a consistent duration, non overlapping time
interval in the data stream. Consider windows with a 30 second duration: all
of the elements in your unbounded &lt;code>PCollection&lt;/code> with timestamp values from
0:00:00 up to (but not including) 0:00:30 belong to the first window, elements
with timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to
the second window, and so on.&lt;/p>
&lt;p>&lt;img src="/images/fixed-time-windows.png" alt="Diagram of fixed time windows, 30s in duration">&lt;/p>
&lt;p>&lt;strong>Figure 6:&lt;/strong> Fixed time windows, 30s in duration.&lt;/p>
&lt;h4 id="sliding-time-windows">8.2.2. Sliding time windows&lt;/h4>
&lt;p>A &lt;strong>sliding time window&lt;/strong> also represents time intervals in the data stream;
however, sliding time windows can overlap. For example, each window might
capture 60 seconds worth of data, but a new window starts every 30 seconds.
The frequency with which sliding windows begin is called the &lt;em>period&lt;/em>.
Therefore, our example would have a window &lt;em>duration&lt;/em> of 60 seconds and a
&lt;em>period&lt;/em> of 30 seconds.&lt;/p>
&lt;p>Because multiple windows overlap, most elements in a data set will belong to
more than one window. This kind of windowing is useful for taking running
averages of data; using sliding time windows, you can compute a running average
of the past 60 seconds&amp;rsquo; worth of data, updated every 30 seconds, in our
example.&lt;/p>
&lt;p>&lt;img src="/images/sliding-time-windows.png" alt="Diagram of sliding time windows, with 1 minute window duration and 30s window period">&lt;/p>
&lt;p>&lt;strong>Figure 7:&lt;/strong> Sliding time windows, with 1 minute window duration and 30s window
period.&lt;/p>
&lt;h4 id="session-windows">8.2.3. Session windows&lt;/h4>
&lt;p>A &lt;strong>session window&lt;/strong> function defines windows that contain elements that are
within a certain gap duration of another element. Session windowing applies on a
per-key basis and is useful for data that is irregularly distributed with
respect to time. For example, a data stream representing user mouse activity may
have long periods of idle time interspersed with high concentrations of clicks.
If data arrives after the minimum specified gap duration time, this initiates
the start of a new window.&lt;/p>
&lt;p>&lt;img src="/images/session-windows.png" alt="Diagram of session windows with a minimum gap duration">&lt;/p>
&lt;p>&lt;strong>Figure 8:&lt;/strong> Session windows, with a minimum gap duration. Note how each data key
has different windows, according to its data distribution.&lt;/p>
&lt;h4 id="single-global-window">8.2.4. The single global window&lt;/h4>
&lt;p>By default, all data in a &lt;code>PCollection&lt;/code> is assigned to the single global window,
and late data is discarded. If your data set is of a fixed size, you can use the
global window default for your &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>You can use the single global window if you are working with an unbounded data set
(e.g. from a streaming data source) but use caution when applying aggregating
transforms such as &lt;code>GroupByKey&lt;/code> and &lt;code>Combine&lt;/code>. The single global window with a
default trigger generally requires the entire data set to be available before
processing, which is not possible with continuously updating data. To perform
aggregations on an unbounded &lt;code>PCollection&lt;/code> that uses global windowing, you
should specify a non-default trigger for that &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;h3 id="setting-your-pcollections-windowing-function">8.3. Setting your PCollection&amp;rsquo;s windowing function&lt;/h3>
&lt;p>You can set the windowing function for a &lt;code>PCollection&lt;/code> by applying the &lt;code>Window&lt;/code>
transform. When you apply the &lt;code>Window&lt;/code> transform, you must provide a &lt;code>WindowFn&lt;/code>.
The &lt;code>WindowFn&lt;/code> determines the windowing function your &lt;code>PCollection&lt;/code> will use for
subsequent grouping transforms, such as a fixed or sliding time window.&lt;/p>
&lt;p>When you set a windowing function, you may also want to set a trigger for your
&lt;code>PCollection&lt;/code>. The trigger determines when each individual window is aggregated
and emitted, and helps refine how the windowing function performs with respect
to late data and computing early results. See the &lt;a href="#triggers">triggers&lt;/a> section
for more information.&lt;/p>
&lt;p class="language-yaml">In Beam YAML windowing specifications can also be placed directly on any
transform rather than requiring an explicit &lt;code>WindowInto&lt;/code> transform.&lt;/p>
&lt;h4 id="using-fixed-time-windows">8.3.1. Fixed-time windows&lt;/h4>
&lt;p>The following example code shows how to apply &lt;code>Window&lt;/code> to divide a &lt;code>PCollection&lt;/code>
into fixed windows, each 60 seconds in length:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">fixedWindowedItems&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">60&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">fixed_windowed_items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">items&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;window&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">fixedWindowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">items&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">windowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">windowings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">fixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">WindowInto&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">windowing&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">fixed&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">size&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">60s&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="using-sliding-time-windows">8.3.2. Sliding time windows&lt;/h4>
&lt;p>The following example code shows how to apply &lt;code>Window&lt;/code> to divide a &lt;code>PCollection&lt;/code>
into sliding time windows. Each window is 30 seconds in length, and a new window
begins every five seconds:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">slidingWindowedItems&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">SlidingWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">30&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">every&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">5&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">sliding_windowed_items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">items&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;window&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SlidingWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">slidingWindowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewSlidingWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">30&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">items&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">windowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">windowings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">slidingWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">5&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">WindowInto&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">windowing&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">sliding&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">size&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">5m&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">period&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">30s&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="using-session-windows">8.3.3. Session windows&lt;/h4>
&lt;p>The following example code shows how to apply &lt;code>Window&lt;/code> to divide a &lt;code>PCollection&lt;/code>
into session windows, where each session must be separated by a time gap of at
least 10 minutes (600 seconds):&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">sessionWindowedItems&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Sessions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">withGapDuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">600&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">session_windowed_items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">items&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;window&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Sessions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">sessionWindowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewSessions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">600&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">items&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">windowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">windowings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">sessions&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">WindowInto&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">windowing&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">sessions&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">gap&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">60s&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Note that the sessions are per-key — each key in the collection will have its
own session groupings depending on the data distribution.&lt;/p>
&lt;h4 id="using-single-global-window">8.3.4. Single global window&lt;/h4>
&lt;p>If your &lt;code>PCollection&lt;/code> is bounded (the size is fixed), you can assign all the
elements to a single global window. The following example code shows how to set
a single global window for a &lt;code>PCollection&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">batchItems&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">GlobalWindows&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">global_windowed_items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">items&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;window&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GlobalWindows&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">globalWindowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewGlobalWindows&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">items&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">windowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">windowings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">globalWindows&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">WindowInto&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">windowing&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">global&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="watermarks-and-late-data">8.4. Watermarks and late data&lt;/h3>
&lt;p>In any data processing system, there is a certain amount of lag between the time
a data event occurs (the &amp;ldquo;event time&amp;rdquo;, determined by the timestamp on the data
element itself) and the time the actual data element gets processed at any stage
in your pipeline (the &amp;ldquo;processing time&amp;rdquo;, determined by the clock on the system
processing the element). In addition, there are no guarantees that data events
will appear in your pipeline in the same order that they were generated.&lt;/p>
&lt;p>For example, let&amp;rsquo;s say we have a &lt;code>PCollection&lt;/code> that&amp;rsquo;s using fixed-time
windowing, with windows that are five minutes long. For each window, Beam must
collect all the data with an &lt;em>event time&lt;/em> timestamp in the given window range
(between 0:00 and 4:59 in the first window, for instance). Data with timestamps
outside that range (data from 5:00 or later) belong to a different window.&lt;/p>
&lt;p>However, data isn&amp;rsquo;t always guaranteed to arrive in a pipeline in time order, or
to always arrive at predictable intervals. Beam tracks a &lt;em>watermark&lt;/em>, which is
the system&amp;rsquo;s notion of when all data in a certain window can be expected to have
arrived in the pipeline. Once the watermark progresses past the end of a window,
any further element that arrives with a timestamp in that window is considered
&lt;strong>late data&lt;/strong>.&lt;/p>
&lt;p>From our example, suppose we have a simple watermark that assumes approximately
30s of lag time between the data timestamps (the event time) and the time the
data appears in the pipeline (the processing time), then Beam would close the
first window at 5:30. If a data record arrives at 5:34, but with a timestamp
that would put it in the 0:00-4:59 window (say, 3:38), then that record is late
data.&lt;/p>
&lt;p>Note: For simplicity, we&amp;rsquo;ve assumed that we&amp;rsquo;re using a very straightforward
watermark that estimates the lag time. In practice, your &lt;code>PCollection&lt;/code>&amp;rsquo;s data
source determines the watermark, and watermarks can be more precise or complex.&lt;/p>
&lt;p>Beam&amp;rsquo;s default windowing configuration tries to determine when all data has
arrived (based on the type of data source) and then advances the watermark past
the end of the window. This default configuration does &lt;em>not&lt;/em> allow late data.
&lt;a href="#triggers">Triggers&lt;/a> allow you to modify and refine the windowing strategy for
a &lt;code>PCollection&lt;/code>. You can use triggers to decide when each individual window
aggregates and reports its results, including how the window emits late
elements.&lt;/p>
&lt;h4 id="managing-late-data">8.4.1. Managing late data&lt;/h4>
&lt;p>You can allow late data by invoking the &lt;code>.withAllowedLateness&lt;/code> operation when
you set your &lt;code>PCollection&lt;/code>&amp;rsquo;s windowing strategy. The following code example
demonstrates a windowing strategy that will allow late data up to two days after
the end of a window.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">fixedWindowedItems&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withAllowedLateness&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardDays&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">2&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Initial&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">trigger_fn&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">accumulation_mode&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timestamp_combiner&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">timestamp_combiner&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">allowed_lateness&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">24&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="c1"># 2 days&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">windowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">items&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AllowedLateness&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">24&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="c1">// 2 days
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>When you set &lt;code>.withAllowedLateness&lt;/code> on a &lt;code>PCollection&lt;/code>, that allowed lateness
propagates forward to any subsequent &lt;code>PCollection&lt;/code> derived from the first
&lt;code>PCollection&lt;/code> you applied allowed lateness to. If you want to change the allowed
lateness later in your pipeline, you must do so explicitly by applying
&lt;code>Window.configure().withAllowedLateness()&lt;/code>.&lt;/p>
&lt;h3 id="adding-timestamps-to-a-pcollections-elements">8.5. Adding timestamps to a PCollection&amp;rsquo;s elements&lt;/h3>
&lt;p>An unbounded source provides a timestamp for each element. Depending on your
unbounded source, you may need to configure how the timestamp is extracted from
the raw data stream.&lt;/p>
&lt;p>However, bounded sources (such as a file from &lt;code>TextIO&lt;/code>) do not provide
timestamps. If you need timestamps, you must add them to your &lt;code>PCollection&lt;/code>’s
elements.&lt;/p>
&lt;p>You can assign new timestamps to the elements of a &lt;code>PCollection&lt;/code> by applying a
&lt;a href="#pardo">ParDo&lt;/a> transform that outputs new elements with timestamps that you
set.&lt;/p>
&lt;p>An example might be if your pipeline reads log records from an input file, and
each log record includes a timestamp field; since your pipeline reads the
records in from a file, the file source doesn&amp;rsquo;t assign timestamps automatically.
You can parse the timestamp field from each record and use a &lt;code>ParDo&lt;/code> transform
with a &lt;code>DoFn&lt;/code> to attach the timestamps to each element in your &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">LogEntry&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">unstampedLogs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">LogEntry&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">stampedLogs&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">unstampedLogs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">LogEntry&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LogEntry&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">LogEntry&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">LogEntry&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Extract the timestamp from log entry we&amp;#39;re currently processing.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">logTimeStamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">extractTimeStampFromLogEntry&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Use OutputReceiver.outputWithTimestamp (rather than
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// OutputReceiver.output) to emit the entry with timestamp attached.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">outputWithTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">logTimeStamp&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">AddTimestampDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Extract the numeric Unix seconds-since-epoch timestamp to be&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># associated with the current log entry.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">unix_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">extract_timestamp_from_log_entry&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Wrap and emit the current entry and new timestamp in a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># TimestampedValue.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampedValue&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unix_timestamp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">timestamped_items&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">items&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;timestamp&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">AddTimestampDoFn&lt;/span>&lt;span class="p">())&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// AddTimestampDoFn extracts an event time from a LogEntry.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="nf">AddTimestampDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span> &lt;span class="nx">LogEntry&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">LogEntry&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">et&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">extractEventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Defining an emitter with beam.EventTime as the first parameter
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// allows the DoFn to set the event time for the emitted element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">mtime&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">FromTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Use the DoFn with ParDo as normal.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">stampedLogs&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">AddTimestampDoFn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">unstampedLogs&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-yaml snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="cl">&lt;span class="nt">type&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">AssignTimestamps&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">config&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">language&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">python&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">timestamp&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">callable&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">|&lt;/span>&lt;span class="sd">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd"> import datetime
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd"> def extract_timestamp(x):
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd"> raw = datetime.datetime.strptime(
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd"> x.external_timestamp_field, &amp;#34;%Y-%m-%d&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sd"> return raw.astimezone(datetime.timezone.utc)&lt;/span>&lt;span class="w"> &lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="triggers">9. Triggers&lt;/h2>
&lt;span class="language-go">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> The Trigger API in the Beam SDK for Go is currently experimental and subject to change.&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;p>When collecting and grouping data into windows, Beam uses &lt;strong>triggers&lt;/strong> to
determine when to emit the aggregated results of each window (referred to as a
&lt;em>pane&lt;/em>). If you use Beam&amp;rsquo;s default windowing configuration and &lt;a href="#default-trigger">default
trigger&lt;/a>, Beam outputs the aggregated result when it
&lt;a href="#watermarks-and-late-data">estimates all data has arrived&lt;/a>, and discards all
subsequent data for that window.&lt;/p>
&lt;p>You can set triggers for your &lt;code>PCollection&lt;/code>s to change this default behavior.
Beam provides a number of pre-built triggers that you can set:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Event time triggers&lt;/strong>. These triggers operate on the event time, as
indicated by the timestamp on each data element. Beam&amp;rsquo;s default trigger is
event time-based.&lt;/li>
&lt;li>&lt;strong>Processing time triggers&lt;/strong>. These triggers operate on the processing time
&amp;ndash; the time when the data element is processed at any given stage in the
pipeline.&lt;/li>
&lt;li>&lt;strong>Data-driven triggers&lt;/strong>. These triggers operate by examining the data as it
arrives in each window, and firing when that data meets a certain property.
Currently, data-driven triggers only support firing after a certain number
of data elements.&lt;/li>
&lt;li>&lt;strong>Composite triggers&lt;/strong>. These triggers combine multiple triggers in various
ways.&lt;/li>
&lt;/ul>
&lt;p>At a high level, triggers provide two additional capabilities compared to simply
outputting at the end of a window:&lt;/p>
&lt;ul>
&lt;li>Triggers allow Beam to emit early results, before all the data in a given
window has arrived. For example, emitting after a certain amount of time
elapses, or after a certain number of elements arrives.&lt;/li>
&lt;li>Triggers allow processing of late data by triggering after the event time
watermark passes the end of the window.&lt;/li>
&lt;/ul>
&lt;p>These capabilities allow you to control the flow of your data and balance
between different factors depending on your use case:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Completeness:&lt;/strong> How important is it to have all of your data before you
compute your result?&lt;/li>
&lt;li>&lt;strong>Latency:&lt;/strong> How long do you want to wait for data? For example, do you wait
until you think you have all data? Do you process data as it arrives?&lt;/li>
&lt;li>&lt;strong>Cost:&lt;/strong> How much compute power/money are you willing to spend to lower the
latency?&lt;/li>
&lt;/ul>
&lt;p>For example, a system that requires time-sensitive updates might use a strict
time-based trigger that emits a window every &lt;em>N&lt;/em> seconds, valuing promptness
over data completeness. A system that values data completeness more than the
exact timing of results might choose to use Beam&amp;rsquo;s default trigger, which fires
at the end of the window.&lt;/p>
&lt;p>You can also set a trigger for an unbounded &lt;code>PCollection&lt;/code> that uses a &lt;a href="#windowing">single
global window for its windowing function&lt;/a>. This can be useful when
you want your pipeline to provide periodic updates on an unbounded data set —
for example, a running average of all data provided to the present time, updated
every N seconds or every N elements.&lt;/p>
&lt;h3 id="event-time-triggers">9.1. Event time triggers&lt;/h3>
&lt;p>The &lt;code>AfterWatermark&lt;/code> trigger operates on &lt;em>event time&lt;/em>. The &lt;code>AfterWatermark&lt;/code>
trigger emits the contents of a window after the
&lt;a href="#watermarks-and-late-data">watermark&lt;/a> passes the end of the window, based on the
timestamps attached to the data elements. The watermark is a global progress
metric, and is Beam&amp;rsquo;s notion of input completeness within your pipeline at any
given point. &lt;span class="language-java">&lt;code>AfterWatermark.pastEndOfWindow()&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>AfterWatermark&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>trigger.AfterEndOfWindow&lt;/code>&lt;/span> &lt;em>only&lt;/em> fires when the
watermark passes the end of the window.&lt;/p>
&lt;p>In addition, you can configure triggers that fire if your pipeline receives data
before or after the end of the window.&lt;/p>
&lt;p>The following example shows a billing scenario, and uses both early and late
firings:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create a bill at the end of the month.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">AfterWatermark&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastEndOfWindow&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// During the month, get near real-time estimates.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withEarlyFirings&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterProcessingTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Fire on any late data so the bill can be corrected.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withLateFirings&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterPane&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">elementCountAtLeast&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">AfterWatermark&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">early&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">delay&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">late&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">trigger&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterEndOfWindow&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">EarlyFiring&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterProcessingTime&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">PlusDelay&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">)).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">LateFiring&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Repeat&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="default-trigger">9.1.1. Default trigger&lt;/h4>
&lt;p>The default trigger for a &lt;code>PCollection&lt;/code> is based on event time, and emits the
results of the window when the Beam&amp;rsquo;s watermark passes the end of the window,
and then fires each time late data arrives.&lt;/p>
&lt;p>However, if you are using both the default windowing configuration and the
default trigger, the default trigger emits exactly once, and late data is
discarded. This is because the default windowing configuration has an allowed
lateness value of 0. See the Handling Late Data section for information about
modifying this behavior.&lt;/p>
&lt;h3 id="processing-time-triggers">9.2. Processing time triggers&lt;/h3>
&lt;p>The &lt;code>AfterProcessingTime&lt;/code> trigger operates on &lt;em>processing time&lt;/em>. For example,
the &lt;span class="language-java">&lt;code>AfterProcessingTime.pastFirstElementInPane()&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>AfterProcessingTime&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>trigger.AfterProcessingTime()&lt;/code>&lt;/span> trigger emits a window
after a certain amount of processing time has passed since data was received.
The processing time is determined by the system clock, rather than the data
element&amp;rsquo;s timestamp.&lt;/p>
&lt;p>The &lt;code>AfterProcessingTime&lt;/code> trigger is useful for triggering early results from a
window, particularly a window with a large time frame such as a single global
window.&lt;/p>
&lt;h3 id="data-driven-triggers">9.3. Data-driven triggers&lt;/h3>
&lt;p>Beam provides one data-driven trigger,
&lt;span class="language-java">&lt;code>AfterPane.elementCountAtLeast()&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>AfterCount&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>trigger.AfterCount()&lt;/code>&lt;/span>. This trigger works on an element
count; it fires after the current pane has collected at least &lt;em>N&lt;/em> elements. This
allows a window to emit early results (before all the data has accumulated),
which can be particularly useful if you are using a single global window.&lt;/p>
&lt;p>It is important to note that if, for example, you specify
&lt;span class="language-java">&lt;code>.elementCountAtLeast(50)&lt;/code>&lt;/span>
&lt;span class="language-py">AfterCount(50)&lt;/span>
&lt;span class="language-go">&lt;code>trigger.AfterCount(50)&lt;/code>&lt;/span> and only 32 elements arrive,
those 32 elements sit around forever. If the 32 elements are important to you,
consider using &lt;a href="#composite-triggers">composite triggers&lt;/a> to combine multiple
conditions. This allows you to specify multiple firing conditions such as &amp;ldquo;fire
either when I receive 50 elements, or every 1 second&amp;rdquo;.&lt;/p>
&lt;h3 id="setting-a-trigger">9.4. Setting a trigger&lt;/h3>
&lt;p>When you set a windowing function for a &lt;code>PCollection&lt;/code> by using the
&lt;span class="language-java">&lt;code>Window&lt;/code>&lt;/span>&lt;span class="language-py">&lt;code>WindowInto&lt;/code>&lt;/span>&lt;span class="language-go">&lt;code>beam.WindowInto&lt;/code>&lt;/span>
transform, you can also specify a trigger.&lt;/p>
&lt;p class="language-java">You set the trigger(s) for a &lt;code>PCollection&lt;/code> by invoking the method
&lt;code>.triggering()&lt;/code> on the result of your &lt;code>Window.into()&lt;/code> transform. This code
sample sets a time-based trigger for a &lt;code>PCollection&lt;/code>, which emits results one
minute after the first element in that window has been processed. The last line
in the code sample, &lt;code>.discardingFiredPanes()&lt;/code>, sets the window&amp;rsquo;s &lt;strong>accumulation
mode&lt;/strong>.&lt;/p>
&lt;p class="language-py">You set the trigger(s) for a &lt;code>PCollection&lt;/code> by setting the &lt;code>trigger&lt;/code> parameter
when you use the &lt;code>WindowInto&lt;/code> transform. This code sample sets a time-based
trigger for a &lt;code>PCollection&lt;/code>, which emits results one minute after the first
element in that window has been processed. The &lt;code>accumulation_mode&lt;/code> parameter
sets the window&amp;rsquo;s &lt;strong>accumulation mode&lt;/strong>.&lt;/p>
&lt;p class="language-go">You set the trigger(s) for a &lt;code>PCollection&lt;/code> by passing in the &lt;code>beam.Trigger&lt;/code> parameter
when you use the &lt;code>beam.WindowInto&lt;/code> transform. This code sample sets a time-based
trigger for a &lt;code>PCollection&lt;/code>, which emits results one minute after the first
element in that window has been processed.
The &lt;code>beam.AccumulationMode&lt;/code> parameter sets the window&amp;rsquo;s &lt;strong>accumulation mode&lt;/strong>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TimeUnit&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">MINUTES&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">triggering&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">discardingFiredPanes&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcollection&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AccumulationMode&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DISCARDING&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">windowedItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">pcollection&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Trigger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterProcessingTime&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">PlusDelay&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AllowedLateness&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">PanesDiscard&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="window-accumulation-modes">9.4.1. Window accumulation modes&lt;/h4>
&lt;p>When you specify a trigger, you must also set the window&amp;rsquo;s &lt;strong>accumulation
mode&lt;/strong>. When a trigger fires, it emits the current contents of the window as a
pane. Since a trigger can fire multiple times, the accumulation mode determines
whether the system &lt;em>accumulates&lt;/em> the window panes as the trigger fires, or
&lt;em>discards&lt;/em> them.&lt;/p>
&lt;p class="language-java">To set a window to accumulate the panes that are produced when the trigger
fires, invoke&lt;code>.accumulatingFiredPanes()&lt;/code> when you set the trigger. To set a
window to discard fired panes, invoke &lt;code>.discardingFiredPanes()&lt;/code>.&lt;/p>
&lt;p class="language-py">To set a window to accumulate the panes that are produced when the trigger
fires, set the &lt;code>accumulation_mode&lt;/code> parameter to &lt;code>ACCUMULATING&lt;/code> when you set the
trigger. To set a window to discard fired panes, set &lt;code>accumulation_mode&lt;/code> to
&lt;code>DISCARDING&lt;/code>.&lt;/p>
&lt;p class="language-go">To set a window to accumulate the panes that are produced when the trigger
fires, set the &lt;code>beam.AccumulationMode&lt;/code> parameter to &lt;code>beam.PanesAccumulate()&lt;/code> when you set the
trigger. To set a window to discard fired panes, set &lt;code>beam.AccumulationMode&lt;/code> to
&lt;code>beam.PanesDiscard()&lt;/code>.&lt;/p>
&lt;p>Let&amp;rsquo;s look an example that uses a &lt;code>PCollection&lt;/code> with fixed-time windowing and a
data-based trigger. This is something you might do if, for example, each window
represented a ten-minute running average, but you wanted to display the current
value of the average in a UI more frequently than every ten minutes. We&amp;rsquo;ll
assume the following conditions:&lt;/p>
&lt;ul>
&lt;li>The &lt;code>PCollection&lt;/code> uses 10-minute fixed-time windows.&lt;/li>
&lt;li>The &lt;code>PCollection&lt;/code> has a repeating trigger that fires every time 3 elements
arrive.&lt;/li>
&lt;/ul>
&lt;p>The following diagram shows data events for key X as they arrive in the
PCollection and are assigned to windows. To keep the diagram a bit simpler,
we&amp;rsquo;ll assume that the events all arrive in the pipeline in order.&lt;/p>
&lt;p>&lt;img src="/images/trigger-accumulation.png" alt="Diagram of data events for accumulating mode example">&lt;/p>
&lt;h5 id="accumulating-mode">9.4.1.1. Accumulating mode&lt;/h5>
&lt;p>If our trigger is set to accumulating mode, the trigger emits the following
values each time it fires. Keep in mind that the trigger fires every time three
elements arrive:&lt;/p>
&lt;pre tabindex="0">&lt;code> First trigger firing: [5, 8, 3]
Second trigger firing: [5, 8, 3, 15, 19, 23]
Third trigger firing: [5, 8, 3, 15, 19, 23, 9, 13, 10]
&lt;/code>&lt;/pre>&lt;h5 id="discarding-mode">9.4.1.2. Discarding mode&lt;/h5>
&lt;p>If our trigger is set to discarding mode, the trigger emits the following values
on each firing:&lt;/p>
&lt;pre tabindex="0">&lt;code> First trigger firing: [5, 8, 3]
Second trigger firing: [15, 19, 23]
Third trigger firing: [9, 13, 10]
&lt;/code>&lt;/pre>&lt;h4 id="handling-late-data">9.4.2. Handling late data&lt;/h4>
&lt;p>If you want your pipeline to process data that arrives after the watermark
passes the end of the window, you can apply an &lt;em>allowed lateness&lt;/em> when you set
your windowing configuration. This gives your trigger the opportunity to react
to the late data. If allowed lateness is set, the default trigger will emit new
results immediately whenever late data arrives.&lt;/p>
&lt;p>You set the allowed lateness by using &lt;span class="language-java">&lt;code>.withAllowedLateness()&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>allowed_lateness&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>beam.AllowedLateness()&lt;/code>&lt;/span>
when you set your windowing function:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TimeUnit&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">MINUTES&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">triggering&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withAllowedLateness&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">30&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">Initial&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pc&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">allowed_lateness&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1800&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># 30 minutes&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="o">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">allowedToBeLateItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">pcollection&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Trigger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterProcessingTime&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">PlusDelay&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AllowedLateness&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>This allowed lateness propagates to all &lt;code>PCollection&lt;/code>s derived as a result of
applying transforms to the original &lt;code>PCollection&lt;/code>. If you want to change the
allowed lateness later in your pipeline, you can apply
&lt;span class="language-java">&lt;code>Window.configure().withAllowedLateness()&lt;/code>&lt;/span>
&lt;span class="language-py">&lt;code>allowed_lateness&lt;/code>&lt;/span>
&lt;span class="language-go">&lt;code>beam.AllowedLateness()&lt;/code>&lt;/span>
again, explicitly.&lt;/p>
&lt;h3 id="composite-triggers">9.5. Composite triggers&lt;/h3>
&lt;p>You can combine multiple triggers to form &lt;strong>composite triggers&lt;/strong>, and can
specify a trigger to emit results repeatedly, at most once, or under other
custom conditions.&lt;/p>
&lt;h4 id="composite-trigger-types">9.5.1. Composite trigger types&lt;/h4>
&lt;p>Beam includes the following composite triggers:&lt;/p>
&lt;ul>
&lt;li>You can add additional early firings or late firings to
&lt;code>AfterWatermark.pastEndOfWindow&lt;/code> via &lt;code>.withEarlyFirings&lt;/code> and
&lt;code>.withLateFirings&lt;/code>.&lt;/li>
&lt;li>&lt;code>Repeatedly.forever&lt;/code> specifies a trigger that executes forever. Any time the
trigger&amp;rsquo;s conditions are met, it causes a window to emit results and then
resets and starts over. It can be useful to combine &lt;code>Repeatedly.forever&lt;/code>
with &lt;code>.orFinally&lt;/code> to specify a condition that causes the repeating trigger
to stop.&lt;/li>
&lt;li>&lt;code>AfterEach.inOrder&lt;/code> combines multiple triggers to fire in a specific
sequence. Each time a trigger in the sequence emits a window, the sequence
advances to the next trigger.&lt;/li>
&lt;li>&lt;code>AfterFirst&lt;/code> takes multiple triggers and emits the first time &lt;em>any&lt;/em> of its
argument triggers is satisfied. This is equivalent to a logical OR operation
for multiple triggers.&lt;/li>
&lt;li>&lt;code>AfterAll&lt;/code> takes multiple triggers and emits when &lt;em>all&lt;/em> of its argument
triggers are satisfied. This is equivalent to a logical AND operation for
multiple triggers.&lt;/li>
&lt;li>&lt;code>orFinally&lt;/code> can serve as a final condition to cause any trigger to fire one
final time and never fire again.&lt;/li>
&lt;/ul>
&lt;h4 id="composite-afterwatermark">9.5.2. Composition with AfterWatermark&lt;/h4>
&lt;p>Some of the most useful composite triggers fire a single time when Beam
estimates that all the data has arrived (i.e. when the watermark passes the end
of the window) combined with either, or both, of the following:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Speculative firings that precede the watermark passing the end of the window
to allow faster processing of partial results.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Late firings that happen after the watermark passes the end of the window,
to allow for handling late-arriving data&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>You can express this pattern using &lt;code>AfterWatermark&lt;/code>. For example, the following
example trigger code fires on the following conditions:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>On Beam&amp;rsquo;s estimate that all the data has arrived (the watermark passes the
end of the window)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Any time late data arrives, after a ten-minute delay&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p class="language-java">&lt;ul>
&lt;li>After two days, we assume no more data of interest will arrive, and the
trigger stops executing&lt;/li>
&lt;/ul>
&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Window&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">configure&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">triggering&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterWatermark&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">pastEndOfWindow&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLateFirings&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withAllowedLateness&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardDays&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">2&lt;/span>&lt;span class="o">)));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pcollection&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterWatermark&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">late&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">allowed_lateness&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AccumulationMode&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DISCARDING&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">compositeTriggerItems&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">pcollection&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Trigger&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterEndOfWindow&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">LateFiring&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">trigger&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AfterProcessingTime&lt;/span>&lt;span class="p">().&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">PlusDelay&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">))),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">AllowedLateness&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">24&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="other-composite-triggers">9.5.3. Other composite triggers&lt;/h4>
&lt;p>You can also build other sorts of composite triggers. The following example code
shows a simple composite trigger that fires whenever the pane has at least 100
elements, or after a minute.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Repeatedly&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">forever&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AfterFirst&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterPane&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">elementCountAtLeast&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">100&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">pastFirstElementInPane&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">plusDelayOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">pcollection&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">trigger&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">Repeatedly&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">AfterAny&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">AfterCount&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">AfterProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span>&lt;span class="p">))),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">accumulation_mode&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">AccumulationMode&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DISCARDING&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="metrics">10. Metrics&lt;/h2>
&lt;p>In the Beam model, metrics provide some insight into the current state of a user pipeline,
potentially while the pipeline is running. There could be different reasons for that, for instance:&lt;/p>
&lt;ul>
&lt;li>Check the number of errors encountered while running a specific step in the pipeline;&lt;/li>
&lt;li>Monitor the number of RPCs made to backend service;&lt;/li>
&lt;li>Retrieve an accurate count of the number of elements that have been processed;&lt;/li>
&lt;li>&amp;hellip;and so on.&lt;/li>
&lt;/ul>
&lt;h3 id="101-the-main-concepts-of-beam-metrics">10.1. The main concepts of Beam metrics&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Named&lt;/strong>. Each metric has a name which consists of a namespace and an actual name. The
namespace can be used to differentiate between multiple metrics with the same name and also
allows querying for all metrics within a specific namespace.&lt;/li>
&lt;li>&lt;strong>Scoped&lt;/strong>. Each metric is reported against a specific step in the pipeline, indicating what
code was running when the metric was incremented.&lt;/li>
&lt;li>&lt;strong>Dynamically Created&lt;/strong>. Metrics may be created during runtime without pre-declaring them, in
much the same way a logger could be created. This makes it easier to produce metrics in utility
code and have them usefully reported.&lt;/li>
&lt;li>&lt;strong>Degrade Gracefully&lt;/strong>. If a runner doesn’t support some part of reporting metrics, the
fallback behavior is to drop the metric updates rather than failing the pipeline. If a runner
doesn’t support some part of querying metrics, the runner will not return the associated data.&lt;/li>
&lt;/ul>
&lt;p>Reported metrics are implicitly scoped to the transform within the pipeline that reported them.
This allows reporting the same metric name in multiple places and identifying the value each
transform reported, as well as aggregating the metric across the entire pipeline.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> It is runner-dependent whether metrics are accessible during pipeline execution or only
after jobs have completed.&lt;/p>
&lt;/blockquote>
&lt;h3 id="types-of-metrics">10.2. Types of metrics&lt;/h3>
&lt;p>There are three types of metrics that are supported for the moment: &lt;code>Counter&lt;/code>, &lt;code>Distribution&lt;/code> and
&lt;code>Gauge&lt;/code>.&lt;/p>
&lt;p class="language-go">In the Beam SDK for Go, a &lt;code>context.Context&lt;/code> provided by the framework must be passed to the metric
or the metric value will not be recorded. The framework will automatically provide a valid
&lt;code>context.Context&lt;/code> to &lt;code>ProcessElement&lt;/code> and similar methods when it&amp;rsquo;s the first parameter.&lt;/p>
&lt;p>&lt;strong>Counter&lt;/strong>: A metric that reports a single long value and can be incremented or decremented.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Counter&lt;/span> &lt;span class="n">counter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">counter&lt;/span>&lt;span class="o">(&lt;/span> &lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// count the elements
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">counter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">inc&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">counter&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewCounter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">MyDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// count the elements
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">counter&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Inc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Distribution&lt;/strong>: A metric that reports information about the distribution of reported values.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Distribution&lt;/span> &lt;span class="n">distribution&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">distribution&lt;/span>&lt;span class="o">(&lt;/span> &lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;distribution1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Integer&lt;/span> &lt;span class="n">element&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// create a distribution (histogram) of the values
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">distribution&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">update&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">distribution&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewDistribution&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;distribution1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">MyDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// create a distribution (histogram) of the values
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">distribution&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Update&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Gauge&lt;/strong>: A metric that reports the latest value out of reported values. Since metrics are
collected from many workers the value may not be the absolute last, but one of the latest values.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Gauge&lt;/span> &lt;span class="n">gauge&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">gauge&lt;/span>&lt;span class="o">(&lt;/span> &lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;gauge1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Integer&lt;/span> &lt;span class="n">element&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// create a gauge (latest value received) of the values
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">gauge&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">var&lt;/span> &lt;span class="nx">gauge&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewGauge&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;gauge1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">MyDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">...&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// create a gauge (latest value received) of the values
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">gauge&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="querying-metrics">10.3. Querying metrics&lt;/h3>
&lt;p class="language-java language-python">&lt;code>PipelineResult&lt;/code> has a method &lt;code>metrics()&lt;/code> which returns a &lt;code>MetricResults&lt;/code> object that allows
accessing metrics. The main method available in &lt;code>MetricResults&lt;/code> allows querying for all metrics
matching a given filter.&lt;/p>
&lt;p class="language-go">&lt;code>beam.PipelineResult&lt;/code> has a method &lt;code>Metrics()&lt;/code> which returns a &lt;code>metrics.Results&lt;/code> object that allows
accessing metrics. The main method available in &lt;code>metrics.Results&lt;/code> allows querying for all metrics
matching a given filter. It takes in a predicate with a &lt;code>SingleResult&lt;/code> parameter type, which can
be used for custom filters.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">PipelineResult&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MetricResults&lt;/span> &lt;span class="nf">metrics&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">abstract&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MetricResults&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">abstract&lt;/span> &lt;span class="n">MetricQueryResults&lt;/span> &lt;span class="nf">queryMetrics&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Nullable&lt;/span> &lt;span class="n">MetricsFilter&lt;/span> &lt;span class="n">filter&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">MetricQueryResults&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MetricResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">getCounters&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MetricResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">DistributionResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">getDistributions&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MetricResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">GaugeResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">getGauges&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="nc">MetricResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">T&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MetricName&lt;/span> &lt;span class="nf">getName&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">String&lt;/span> &lt;span class="nf">getStep&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">T&lt;/span> &lt;span class="nf">getCommitted&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">T&lt;/span> &lt;span class="nf">getAttempted&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">queryMetrics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">pr&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PipelineResult&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ns&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">n&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">metrics&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">QueryResults&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">pr&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Metrics&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Query&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MetricResult&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Namespace&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">ns&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Name&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">n&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="using-metrics">10.4. Using metrics in pipeline&lt;/h3>
&lt;p>Below, there is a simple example of how to use a &lt;code>Counter&lt;/code> metric in a user pipeline.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// creating a pipeline with custom metrics DoFn
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">MyMetricsDoFn&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">pipelineResult&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">(...);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// request the metric called &amp;#34;counter1&amp;#34; in namespace called &amp;#34;namespace&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">MetricQueryResults&lt;/span> &lt;span class="n">metrics&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipelineResult&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">metrics&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">queryMetrics&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MetricsFilter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">builder&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">addNameFilter&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MetricNameFilter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">named&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// print the metric value - there should be only one line because there is only one metric
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// called &amp;#34;counter1&amp;#34; in the namespace called &amp;#34;namespace&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">MetricResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">counter&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="n">metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getCounters&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">System&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">println&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">counter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getName&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s">&amp;#34;:&amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">counter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getAttempted&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyMetricsDoFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Counter&lt;/span> &lt;span class="n">counter&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Metrics&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">counter&lt;/span>&lt;span class="o">(&lt;/span> &lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">context&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// count the elements
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">counter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">inc&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">context&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">context&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">addMetricDoFnToPipeline&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">MyMetricsDoFn&lt;/span>&lt;span class="p">{},&lt;/span> &lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">executePipelineAndGetMetrics&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">p&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Pipeline&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">metrics&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">QueryResults&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">pr&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Run&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">runner&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">metrics&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">QueryResults&lt;/span>&lt;span class="p">{},&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Request the metric called &amp;#34;counter1&amp;#34; in namespace called &amp;#34;namespace&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">ms&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">pr&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Metrics&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Query&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MetricResult&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">bool&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Namespace&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nx">r&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Name&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Print the metric value - there should be only one line because there is
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// only one metric called &amp;#34;counter1&amp;#34; in the namespace called &amp;#34;namespace&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">c&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">ms&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Counters&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Println&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Namespace&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="s">&amp;#34;-&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Name&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="s">&amp;#34;:&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">c&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Committed&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">ms&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">MyMetricsDoFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">counter&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Counter&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterType&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">MyMetricsDoFn&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">MyMetricsDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">Setup&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// While metrics can be defined in package scope or dynamically
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// it&amp;#39;s most efficient to include them in the DoFn.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">counter&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewCounter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;namespace&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s">&amp;#34;counter1&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">MyMetricsDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span> &lt;span class="nx">context&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// count the elements
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">counter&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Inc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ctx&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">v&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="export-metrics">10.5. Export metrics&lt;/h3>
&lt;p>Beam metrics can be exported to external sinks. If a metrics sink is set up in the configuration, the runner will push metrics to it at a default 5s period.
The configuration is held in the &lt;a href="https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/metrics/MetricsOptions.html">MetricsOptions&lt;/a> class.
It contains push period configuration and also sink specific options such as type and URL. As for now only the REST HTTP and the Graphite sinks are supported and only
Flink and Spark runners support metrics export.&lt;/p>
&lt;p>Also Beam metrics are exported to inner Spark and Flink dashboards to be consulted in their respective UI.&lt;/p>
&lt;h2 id="state-and-timers">11. State and Timers&lt;/h2>
&lt;p>Beam&amp;rsquo;s windowing and triggering facilities provide a powerful abstraction for grouping and aggregating unbounded input
data based on timestamps. However there are aggregation use cases for which developers may require a higher degree of
control than provided by windows and triggers. Beam provides an API for manually managing per-key state, allowing for
fine-grained control over aggregations.&lt;/p>
&lt;p>Beam&amp;rsquo;s state API models state per key. To use the state API, you start out with a keyed &lt;code>PCollection&lt;/code>, which in Java
is modeled as a &lt;code>PCollection&amp;lt;KV&amp;lt;K, V&amp;gt;&amp;gt;&lt;/code>. A &lt;code>ParDo&lt;/code> processing this &lt;code>PCollection&lt;/code> can now declare state variables. Inside
the &lt;code>ParDo&lt;/code> these state variables can be used to write or update state for the current key or to read previous state
written for that key. State is always fully scoped only to the current processing key.&lt;/p>
&lt;p>Windowing can still be used together with stateful processing. All state for a key is scoped to the current window. This
means that the first time a key is seen for a given window any state reads will return empty, and that a runner can
garbage collect state when a window is completed. It&amp;rsquo;s also often useful to use Beam&amp;rsquo;s windowed aggregations prior to
the stateful operator. For example, using a combiner to preaggregate data, and then storing aggregated data inside of
state. Merging windows are not currently supported when using state and timers.&lt;/p>
&lt;p>Sometimes stateful processing is used to implement state-machine style processing inside a &lt;code>DoFn&lt;/code>. When doing this,
care must be taken to remember that the elements in input PCollection have no guaranteed order and to ensure that the
program logic is resilient to this. Unit tests written using the DirectRunner will shuffle the order of element
processing, and are recommended to test for correctness.&lt;/p>
&lt;p class="language-java">In Java, DoFn declares states to be accessed by creating final &lt;code>StateSpec&lt;/code> member variables representing each state. Each
state must be named using the &lt;code>StateId&lt;/code> annotation; this name is unique to a ParDo in the graph and has no relation
to other nodes in the graph. A &lt;code>DoFn&lt;/code> can declare multiple state variables.&lt;/p>
&lt;p class="language-py">In Python, DoFn declares states to be accessed by creating &lt;code>StateSpec&lt;/code> class member variables representing each state. Each
&lt;code>StateSpec&lt;/code> is initialized with a name, this name is unique to a ParDo in the graph and has no relation
to other nodes in the graph. A &lt;code>DoFn&lt;/code> can declare multiple state variables.&lt;/p>
&lt;p class="language-go">In Go, DoFn declares states to be accessed by creating state struct member variables representing each state. Each
state variable is initialized with a key, this key is unique to a ParDo in the graph and has no relation
to other nodes in the graph. If no name is supplied, the key defaults to the member variable&amp;rsquo;s name.
A &lt;code>DoFn&lt;/code> can declare multiple state variables.&lt;/p>
&lt;span class="language-typescript">
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> The Beam SDK for Typescript does not yet support a State and Timer API,
but it is possible to use these features from cross-language pipelines (see below).&lt;/p>
&lt;/blockquote>
&lt;/span>
&lt;h3 id="types-of-state">11.1. Types of state&lt;/h3>
&lt;p>Beam provides several types of state:&lt;/p>
&lt;h4 id="valuestate">ValueState&lt;/h4>
&lt;p>A ValueState is a scalar state value. For each key in the input, a ValueState will store a typed value that can be
read and modified inside the DoFn&amp;rsquo;s &lt;code>@ProcessElement&lt;/code> or &lt;code>@OnTimer&lt;/code> methods. If the type of the ValueState has a coder
registered, then Beam will automatically infer the coder for the state value. Otherwise, a coder can be explicitly
specified when creating the ValueState. For example, the following ParDo creates a single state variable that
accumulates the number of elements seen.&lt;/p>
&lt;p>Note: &lt;code>ValueState&lt;/code> is called &lt;code>ReadModifyWriteState&lt;/code> in the Python SDK.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">numElements&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Read the number element seen so far for this user key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// state.read() returns null if it was never set. The below code allows us to have a default value of 0.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">currentValue&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">MoreObjects&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">firstNonNull&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">0&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Update the state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">currentValue&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">1&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// valueStateFn keeps track of the number of elements seen.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">valueStateFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Val&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">valueStateFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">book&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get the value stored in our state
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">val&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ok&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">ok&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">val&lt;/span>&lt;span class="o">+&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">val&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="mi">10000&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Example of clearing and starting again with an empty bag
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Beam also allows explicitly specifying a coder for &lt;code>ValueState&lt;/code> values. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MyType&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">numElements&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">MyTypeCoder&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ReadModifyWriteStateDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">STATE_SPEC&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ReadModifyWriteStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;num_elements&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">STATE_SPEC&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Read the number element seen so far for this user key.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">current_value&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="ow">or&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">current_value&lt;/span>&lt;span class="o">+&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;state pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ReadModifyWriteStateDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">valueStateDoFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Val&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">encode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">m&lt;/span> &lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">byte&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">m&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Bytes&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">decode&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">b&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="kt">byte&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">MyCustomType&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">{}.&lt;/span>&lt;span class="nf">FromBytes&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">b&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">init&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterCoder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflect&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TypeOf&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">MyCustomType&lt;/span>&lt;span class="p">)(&lt;/span>&lt;span class="kc">nil&lt;/span>&lt;span class="p">)).&lt;/span>&lt;span class="nf">Elem&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">encode&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">decode&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-typescript snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-typescript" data-lang="typescript">&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">pcoll&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nx">root&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;a&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>: &lt;span class="kt">1&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;b&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>: &lt;span class="kt">10&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="s2">&amp;#34;a&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>: &lt;span class="kt">100&lt;/span> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kr">const&lt;/span> &lt;span class="nx">result&lt;/span>: &lt;span class="kt">PCollection&lt;/span>&lt;span class="p">&amp;lt;&lt;/span>&lt;span class="nt">number&lt;/span>&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">await&lt;/span> &lt;span class="nx">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">apply&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">withCoderInternal&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="nx">KVCoder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="nx">StrUtf8Coder&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="nx">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">.&lt;/span>&lt;span class="nx">applyAsync&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">pythonTransform&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Construct a new Transform from source.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="s2">&amp;#34;__constructor__&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">pythonCallable&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sb">`
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> # Define a DoFn to be used below.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> class ReadModifyWriteStateDoFn(beam.DoFn):
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> STATE_SPEC = beam.transforms.userstate.ReadModifyWriteStateSpec(
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> &amp;#39;num_elements&amp;#39;, beam.coders.VarIntCoder())
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> def process(self, element, state=beam.DoFn.StateParam(STATE_SPEC)):
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> current_value = state.read() or 0
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> state.write(current_value + 1)
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> yield current_value + 1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> class MyPythonTransform(beam.PTransform):
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> def expand(self, pcoll):
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> return pcoll | beam.ParDo(ReadModifyWriteStateDoFn())
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="sb"> `&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Keyword arguments to pass to the transform, if any.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">{},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output type if it cannot be inferred
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">requestedOutputCoders&lt;/span>&lt;span class="o">:&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="nx">output&lt;/span>: &lt;span class="kt">new&lt;/span> &lt;span class="nx">VarIntCoder&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">}&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="combiningstate">CombiningState&lt;/h4>
&lt;p>&lt;code>CombiningState&lt;/code> allows you to create a state object that is updated using a Beam combiner. For example, the previous
&lt;code>ValueState&lt;/code> example could be rewritten to use &lt;code>CombiningState&lt;/code>&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">numElements&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">combining&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ofIntegers&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">CombiningStateDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SUM_TOTAL&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">CombiningValueStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;total&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">SUM_TOTAL&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Combine state pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">CombiningStateDofn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// combiningStateFn keeps track of the number of elements seen.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">combiningStateFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// types are the types of the accumulator, input, and output respectively
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Val&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combining&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">combiningStateFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">book&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get the value stored in our state
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">val&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">val&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="mi">10000&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Example of clearing and starting again with an empty bag
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">combineState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">input&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// CombineFn param can be a simple fn like this or a structural CombineFn
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">cFn&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeCombiningState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;stateKey&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="kt">int&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">b&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">combined&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">combiningStateFn&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Val&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">cFn&lt;/span>&lt;span class="p">},&lt;/span> &lt;span class="nx">input&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="bagstate">BagState&lt;/h4>
&lt;p>A common use case for state is to accumulate multiple elements. &lt;code>BagState&lt;/code> allows for accumulating an unordered set
of elements. This allows for addition of elements to the collection without requiring the reading of the entire
collection first, which is an efficiency gain. In addition, runners that support paged reads can allow individual
bags larger than available memory.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">numElements&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the current element to the bag for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">shouldFetch&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Occasionally we fetch and process the values.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">processValues&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">values&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span> &lt;span class="c1">// Clear the state for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">BagStateDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element_pair&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element_pair&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">should_fetch&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">all_elements&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">list&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">process_values&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">all_elements&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Bag state pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BagStateDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// bagStateFn only emits words that haven&amp;#39;t been seen
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">bagStateFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Bag&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">bagStateFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">book&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Get all values we&amp;#39;ve written to this bag state in this window.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">vals&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ok&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">ok&lt;/span> &lt;span class="o">||&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nf">contains&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">vals&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emitWords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">vals&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="mi">10000&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Example of clearing and starting again with an empty bag
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">p&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="deferred-state-reads">11.2. Deferred state reads&lt;/h3>
&lt;p>When a &lt;code>DoFn&lt;/code> contains multiple state specifications, reading each one in order can be slow. Calling the &lt;code>read()&lt;/code> function
on a state can cause the runner to perform a blocking read. Performing multiple blocking reads in sequence adds latency
to element processing. If you know that a state will always be read, you can annotate it as @AlwaysFetched, and then the
runner can prefetch all of the states necessary. For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state1&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state2&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state3&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state1&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state1&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state2&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state2&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state3&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state3&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state1&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">This&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="n">supported&lt;/span> &lt;span class="n">yet&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">see&lt;/span> &lt;span class="n">https&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="o">//&lt;/span>&lt;span class="n">github&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">com&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">apache&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">issues&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="mf">20739.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">This&lt;/span> &lt;span class="nx">is&lt;/span> &lt;span class="nx">not&lt;/span> &lt;span class="nx">supported&lt;/span> &lt;span class="nx">yet&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">see&lt;/span> &lt;span class="nx">https&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="c1">//github.com/apache/beam/issues/22964.
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>If however there are code paths in which the states are not fetched, then annotating with @AlwaysFetched will add
unnecessary fetching for those paths. In this case, the readLater method allows the runner to know that the state will
be read in the future, allowing multiple state reads to be batched together.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state1&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state2&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state3&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state3&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state1&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state1&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state2&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state2&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state3&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state3&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="cm">/* should read state */&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state1&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">readLater&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">readLater&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">readLater&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The runner can now batch all three states into a single read, reducing latency.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">processState1&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">state1&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">processState2&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">state2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">processState3&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">state3&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="timers">11.3. Timers&lt;/h3>
&lt;p>Beam provides a per-key timer callback API. This allows for delayed processing of data stored using the state API.
Timers can be set to callback at either an event-time or a processing-time timestamp. Every timer is identified with a
TimerId. A given timer for a key can only be set for a single timestamp. Calling set on a timer overwrites the previous
firing time for that key&amp;rsquo;s timer.&lt;/p>
&lt;h4 id="event-time-timers">11.3.1. Event-time timers&lt;/h4>
&lt;p>Event-time timers fire when the input watermark for the DoFn passes the time at which the timer is set, meaning that
the runner believes that there are no more elements to be processed with timestamps before the timer timestamp. This
allows for event-time aggregations.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">EVENT_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">elementTs&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">Timer&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set an event-time timer to the element timestamp.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">elementTs&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">//Process timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">EventTimerDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;timer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WATERMARK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element_pair&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element_pair&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set an event-time timer to the element timestamp.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expiry_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;EventTime timer pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">EventTimerDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">eventTimerDoFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">State&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">eventTimerDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ts&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">book&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set an event-time timer to the element timestamp.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ts&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ToTime&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">eventTimerDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// process callback for this timer
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">AddEventTimeDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">eventTimerDoFn&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Timers are given family names so their callbacks can be handled independantly.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Timer&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InEventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;processWatermark&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">State&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;latest&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="processing-time-timers">11.3.2. Processing-time timers&lt;/h4>
&lt;p>Processing-time timers fire when the real wall-clock time passes. This is often used to create larger batches of data
before processing. It can also be used to schedule events that should occur at a specific time. Just like with
event-time timers, processing-time timers are per key - each key has a separate copy of the timer.&lt;/p>
&lt;p>While processing-time timers can be set to an absolute timestamp, it is very common to set them to an offset relative
to the current time. In Java, the &lt;code>Timer.offset&lt;/code> and &lt;code>Timer.setRelative&lt;/code> methods can be used to accomplish this.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">PROCESSING_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">Timer&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set a timer to go off 30 seconds in the future.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">offset&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">30&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">setRelative&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">//Process timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ProcessingTimerDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;timer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">REAL_TIME&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element_pair&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element_pair&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set a timer to go off 30 seconds in the future.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expiry_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Process timer.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ProcessingTime timer pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ProcessingTimerDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">processingTimerDoFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ProcessingTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">processingTimerDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">book&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">word&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set a timer to go off 30 seconds in the future.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">processingTimerDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// process callback for this timer
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">AddProcessingTimeDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">processingTimerDoFn&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Timers are given family names so their callbacks can be handled independantly.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Timer&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;timer&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="dynamic-timer-tags">11.3.3. Dynamic timer tags&lt;/h4>
&lt;p>Beam also supports dynamically setting a timer tag using &lt;code>TimerMap&lt;/code> in the Java SDK. This allows for setting multiple different timers
in a &lt;code>DoFn&lt;/code> and allowing for the timer tags to be dynamically chosen - e.g. based on data in the input elements. A
timer with a specific tag can only be set to a single timestamp, so setting the timer again has the effect of
overwriting the previous expiration time for the timer with that tag. Each &lt;code>TimerMap&lt;/code> is identified with a timer family
id, and timers in different timer families are independent.&lt;/p>
&lt;p>In the Python SDK, a dynamic timer tag can be specified while calling &lt;code>set()&lt;/code> or &lt;code>clear()&lt;/code>. By default, the timer
tag is an empty string if not specified.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerFamily&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;actionTimers&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timerMap&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">EVENT_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">elementTs&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerFamily&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;actionTimers&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">TimerMap&lt;/span> &lt;span class="n">timers&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getActionType&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">elementTs&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimerFamily&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;actionTimers&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@TimerId&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">timerId&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">LOG&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">info&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Timer fired with id &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">timerId&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">TimerDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;timer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">REAL_TIME&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element_pair&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element_pair&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set a timer to go off 30 seconds in the future with dynamic timer tag &amp;#39;first_timer&amp;#39;.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># And set a timer to go off 60 seconds in the future with dynamic timer tag &amp;#39;second_timer&amp;#39;.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">30&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">dynamic_timer_tag&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;first_timer&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">dynamic_timer_tag&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;second_timer&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Note that a timer can also be explicitly cleared if previously set with a dynamic timer tag:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># timer.clear(dynamic_timer_tag=...)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expiry_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">timer_tag&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DynamicTimerTagParam&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Process timer, the dynamic timer tag associated with expiring timer can be read back with DoFn.DynamicTimerTagParam.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">timer_tag&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;fired&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ProcessingTime timer pardo&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TimerDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">hasAction&lt;/span> &lt;span class="kd">interface&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">Action&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">string&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">dynamicTagsDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">hasAction&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">dynamicTagsDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ts&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="nx">V&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set a timer to go off 30 seconds in the future.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ts&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ToTime&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WithTag&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">value&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Action&lt;/span>&lt;span class="p">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">dynamicTagsDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emitWords&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">tag&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Tag&lt;/span> &lt;span class="c1">// Do something with fired tag
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">tag&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nx">AddDynamicTimerTagsDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">hasAction&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">dynamicTagsDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Timer&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InEventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;actionTimers&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="timer-output-timestamps">11.3.4. Timer output timestamps&lt;/h4>
&lt;p>By default, event-time timers will hold the output watermark of the &lt;code>ParDo&lt;/code> to the timestamp of the timer. This means
that if a timer is set to 12pm, any windowed aggregations or event-time timers later in the pipeline graph that finish
after 12pm will not expire. The timestamp of the timer is also the default output timestamp for the timer callback. This
means that any elements output from the onTimer method will have a timestamp equal to the timestamp of the timer firing.
For processing-time timers, the default output timestamp and watermark hold is the value of the input watermark at the
time the timer was set.&lt;/p>
&lt;p>In some cases, a DoFn needs to output timestamps earlier than the timer expiration time, and therefore also needs to
hold its output watermark to those timestamps. For example, consider the following pipeline that temporarily batches
records into state, and sets a timer to drain the state. This code may appear correct, but will not work properly.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">timerSet&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">PROCESSING_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">timerSet&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">Timer&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the current element to the bag for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(!&lt;/span>&lt;span class="n">MoreObjects&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">firstNonNull&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">timerSet&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// If the timer is not current set, then set it to go off in a minute.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">offset&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">setRelative&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timerSet&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">timerSet&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">ValueT&lt;/span> &lt;span class="n">bufferedElement&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output each element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">outputWithTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">bufferedElement&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">bufferedElement&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timestamp&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Note that the timer has now fired.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timerSet&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">badTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">ElementBag&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">TimerSet&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">bool&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">OutputState&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ProcessingTime&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">badTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="nx">V&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the current element to the bag for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">set&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">set&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputState&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">badTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputState&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">vs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">vs&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output each element
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Sprintf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;%v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Note that the timer has now fired.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The problem with this code is that the ParDo is buffering elements, however nothing is preventing the watermark
from advancing past the timestamp of those elements, so all those elements might be dropped as late data. In order
to prevent this from happening, an output timestamp needs to be set on the timer to prevent the watermark from advancing
past the timestamp of the minimum element. The following code demonstrates this.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The bag of elements accumulated.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The timestamp of the timer set.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerTimestamp&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">timerTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The minimum timestamp stored in the bag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;minTimestampInBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">minTimestampInBag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">combining&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Min&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ofLongs&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">PROCESSING_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerTimestamp&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;minTimestampInBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">Timer&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the current element to the bag for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Keep track of the minimum element timestamp currently stored in the bag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">timestamp&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// If the timer is already set, then reset it at the same time but with an updated output timestamp (otherwise
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// we would keep resetting the timer to the future). If there is no timer set, then set one to expire in a minute.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Long&lt;/span> &lt;span class="n">timerTimestampMs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Instant&lt;/span> &lt;span class="n">timerToSet&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">isEmpty&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">?&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">now&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">plus&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">timerTimestampMs&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Setting the outputTimestamp to the minimum timestamp in the bag holds the watermark to that timestamp until the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// timer fires. This allows outputting all the elements with their timestamp.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">withOutputTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">minTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">s&lt;/span> &lt;span class="n">et&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">timerToSet&lt;/span>&lt;span class="o">).&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">timerToSet&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;timerTimestamp&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">ValueT&lt;/span> &lt;span class="n">bufferedElement&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">elementBag&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output each element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">outputWithTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">bufferedElement&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">bufferedElement&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timestamp&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Note that the timer has now fired.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timerTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Timer&lt;/span> &lt;span class="n">output&lt;/span> &lt;span class="n">timestamps&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="n">yet&lt;/span> &lt;span class="n">supported&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">Python&lt;/span> &lt;span class="n">SDK&lt;/span>&lt;span class="o">.&lt;/span> &lt;span class="n">See&lt;/span> &lt;span class="n">https&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="o">//&lt;/span>&lt;span class="n">github&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">com&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">apache&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="n">issues&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="mf">20705.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Timestamp&lt;/span> &lt;span class="kt">int64&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Value&lt;/span> &lt;span class="nx">V&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">goodTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">ElementBag&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]]&lt;/span> &lt;span class="c1">// The bag of elements accumulated.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">TimerTimerstamp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// The timestamp of the timer set.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">MinTimestampInBag&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combining&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// The minimum timestamp stored in the bag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">OutputState&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ProcessingTime&lt;/span> &lt;span class="c1">// The timestamp of the timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">goodTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="nx">V&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Add the current element to the bag for this key, and preserve the event time.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]{&lt;/span>&lt;span class="nx">Timestamp&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">et&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Milliseconds&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">Value&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">});&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Keep track of the minimum element timestamp currently stored in the bag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MinTimestampInBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">et&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Milliseconds&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// If the timer is already set, then reset it at the same time but with an updated output timestamp (otherwise
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// we would keep resetting the timer to the future). If there is no timer set, then set one to expire in a minute.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">ts&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">ok&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerTimerstamp&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">var&lt;/span> &lt;span class="nx">tsToSet&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">ok&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">tsToSet&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnixMilli&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ts&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">tsToSet&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">minTs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MinTimestampInBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">outputTs&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnixMilli&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">minTs&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Setting the outputTimestamp to the minimum timestamp in the bag holds the watermark to that timestamp until the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// timer fires. This allows outputting all the elements with their timestamp.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputState&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tsToSet&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WithOutputTimestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">outputTs&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerTimerstamp&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tsToSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnixMilli&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">goodTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputState&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">vs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">vs&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output each element with their timestamp
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">EventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Timestamp&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">fmt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Sprintf&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;%v&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">v&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Note that the timer has now fired.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">TimerTimerstamp&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nx">AddTimedOutputBatching&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">goodTimerOutputTimestampsFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">ElementBag&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeBagState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">element&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]](&lt;/span>&lt;span class="s">&amp;#34;elementBag&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">TimerTimerstamp&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;timerTimestamp&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">MinTimestampInBag&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeCombiningState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;minTimestampInBag&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="p">&amp;lt;&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">b&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">OutputState&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="garbage-collecting-state">11.4. Garbage collecting state&lt;/h3>
&lt;p>Per-key state needs to be garbage collected, or eventually the increasing size of state may negatively impact
performance. There are two common strategies for garbage collecting state.&lt;/p>
&lt;h5 id="using-windows-for-garbage-collection">11.4.1. &lt;strong>Using windows for garbage collection&lt;/strong>&lt;/h5>
&lt;p>All state and timers for a key is scoped to the window it is in. This means that depending on the timestamp of the
input element the ParDo will see different values for the state depending on the window that element falls into. In
addition, once the input watermark passes the end of the window, the runner should garbage collect all state for that
window. (note: if allowed lateness is set to a positive value for the window, the runner must wait for the watermark to
pass the end of the window plus the allowed lateness before garbage collecting state). This can be used as a
garbage-collection strategy.&lt;/p>
&lt;p>For example, given the following:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Window&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CalendarWindows&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">days&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withTimeZone&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">DateTimeZone&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">forID&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;America/Los_Angeles&amp;#34;&lt;/span>&lt;span class="o">))));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">ts&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The state is scoped to a calendar day window. That means that if the input timestamp ts is after
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// midnight PST, then a new copy of the state will be seen for the next day.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">StateDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element_pair&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Windowing&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">60&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">24&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;DoFn&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">StateDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">items&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">statefulDoFn&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">S&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;S&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">elements&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">out&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">WindowInto&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">window&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewFixedWindows&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">24&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">items&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>This &lt;code>ParDo&lt;/code> stores state per day. Once the pipeline is done processing data for a given day, all the state for that
day is garbage collected.&lt;/p>
&lt;h5 id="using-timers-for-garbage-collection">11.4.1. &lt;strong>Using timers For garbage collection&lt;/strong>&lt;/h5>
&lt;p>In some cases, it is difficult to find a windowing strategy that models the desired garbage-collection strategy. For
example, a common desire is to garbage collect state for a key once no activity has been seen on the key for some time.
This can be done by updating a timer that garbage collects state. For example&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The state for the key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The maximum element timestamp seen so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maxTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">combining&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Max&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ofLongs&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">gcTimer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">EVENT_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">ts&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">gcTimer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">updateState&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">state&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set the timer to be one hour after the maximum timestamp seen. This will keep overwriting the same timer, so
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// as long as there is activity on this key the state will stay active. Once the key goes inactive for one hour&amp;#39;s
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// worth of event time (as measured by the watermark), then the gc timer will fire.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">expirationTime&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">plus&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardHours&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">expirationTime&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Clear all state for the key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maxTimestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">UserDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ALL_ELEMENTS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;state&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">coders&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">VarIntCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MAX_TIMESTAMP&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">CombiningValueStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;max_timestamp_seen&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">max&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;gc-timer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WATERMARK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">t&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MAX_TIMESTAMP&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">update_state&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">state&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">t&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">micros&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set the timer to be one hour after the maximum timestamp seen. This will keep overwriting the same timer, so&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># as long as there is activity on this key the state will stay active. Once the key goes inactive for one hour&amp;#39;s&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># worth of event time (as measured by the watermark), then the gc timer will fire.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">expiration_time&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Timestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">micros&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">max_timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">expiration_time&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">expiry_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ALL_ELEMENTS&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MAX_TIMESTAMP&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Read per user&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerUser&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;User DoFn&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">UserDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">timerGarbageCollectionFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">State&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// The state for the key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">MaxTimestampInBag&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combining&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// The maximum element timestamp seen so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">GcTimer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span> &lt;span class="c1">// The timestamp of the timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">timerGarbageCollectionFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="nx">V&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">updateState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">State&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampInBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">et&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Milliseconds&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Set the timer to be one hour after the maximum timestamp seen. This will keep overwriting the same timer, so
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// as long as there is activity on this key the state will stay active. Once the key goes inactive for one hour&amp;#39;s
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// worth of event time (as measured by the watermark), then the gc timer will fire.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">maxTs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampInBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">expirationTime&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnixMilli&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">maxTs&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">expirationTime&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">timerGarbageCollectionFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Clear all the state for the key
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">State&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampInBag&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nx">AddTimerGarbageCollection&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">timerGarbageCollectionFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">State&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;timerTimestamp&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">MaxTimestampInBag&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeCombiningState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;maxTimestampInBag&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">b&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InEventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="state-timers-examples">11.5. State and timers examples&lt;/h3>
&lt;p>Following are some example uses of state and timers&lt;/p>
&lt;h4 id="joining-clicks-and-views">11.5.1. Joining clicks and views&lt;/h4>
&lt;p>In this example, the pipeline is processing data from an e-commerce site&amp;rsquo;s home page. There are two input streams:
a stream of views, representing suggested product links displayed to the user on the home page, and a stream of
clicks, representing actual user clicks on these links. The goal of the pipeline is to join click events with view
events, outputting a new joined event that contains information from both events. Each link has a unique identifier
that is present in both the view event and the join event.&lt;/p>
&lt;p>Many view events will never be followed up with clicks. This pipeline will wait one hour for a click, after which it
will give up on this join. While every click event should have a view event, some small number of view events may be
lost and never make it to the Beam pipeline; the pipeline will similarly wait one hour after seeing a click event, and
give up if the view event does not arrive in that time. Input events are not ordered - it is possible to see the click
event before the view event. The one hour join timeout should be based on event time, not on processing time.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Read the event stream and key it by the link id.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">eventsPerLinkId&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">readEvents&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">WithKeys&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">::&lt;/span>&lt;span class="n">getLinkId&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withKeyType&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">eventsPerLinkId&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">JoinedEvent&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Store the view event.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;view&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">viewState&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Store the click event.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;click&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">clickState&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The maximum element timestamp seen so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maxTimestamp&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">combining&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Max&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ofLongs&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Timer that fires when an hour goes by with an incomplete join.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">gcTimer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">EVENT_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Timestamp&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="n">ts&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;view&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">viewState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;click&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@AlwaysFetched&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">gcTimer&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">JoinedEvent&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Store the event into the correct state variable.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Event&lt;/span> &lt;span class="n">event&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">valueState&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getType&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">equals&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">VIEW&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">?&lt;/span> &lt;span class="n">viewState&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">valueState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Event&lt;/span> &lt;span class="n">view&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">viewState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Event&lt;/span> &lt;span class="n">click&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="k">if&lt;/span> &lt;span class="n">view&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">null&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="n">click&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">null&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We&amp;#39;ve seen both a view and a click. Output a joined event and clear state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">output&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JoinedEvent&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">view&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">click&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">clearState&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">viewState&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We&amp;#39;ve only seen on half of the join.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Set the timer to be one hour after the maximum timestamp seen. This will keep overwriting the same timer, so
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// as long as there is activity on this key the state will stay active. Once the key goes inactive for one hour&amp;#39;s
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// worth of event time (as measured by the watermark), then the gc timer will fire.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMillis&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Instant&lt;/span> &lt;span class="n">expirationTime&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Instant&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">()).&lt;/span>&lt;span class="na">plus&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardHours&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">gcTimer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">expirationTime&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;view&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">viewState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;click&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// An hour has gone by with an incomplete join. Give up and clear the state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">clearState&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">viewState&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">clearState&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;view&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">viewState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;click&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Event&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">clickState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">CombiningState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kt">long&lt;/span>&lt;span class="o">[],&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">viewState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">clickState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">maxTimestampState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">JoinDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># stores the view event.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">VIEW_STATE_SPEC&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ReadModifyWriteStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;view&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">EventCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># stores the click event.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CLICK_STATE_SPEC&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ReadModifyWriteStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;click&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">EventCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The maximum element timestamp value seen so far.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MAX_TIMESTAMP&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">CombiningValueStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;max_timestamp_seen&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">max&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Timer that fires when an hour goes by with an incomplete join.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">GC_TIMER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;gc&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WATERMARK&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">view&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">VIEW_STATE_SPEC&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">click&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">CLICK_STATE_SPEC&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MAX_TIMESTAMP&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ts&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">gc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">GC_TIMER&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">event&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">element&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">event&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">type&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s1">&amp;#39;view&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">view&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">click&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">previous_view&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">view&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">previous_click&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">click&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># We&amp;#39;ve seen both a view and a click. Output a joined event and clear state.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">previous_view&lt;/span> &lt;span class="ow">and&lt;/span> &lt;span class="n">previous_click&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">previous_view&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">previous_click&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">view&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">click&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">gc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3600&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">GC_TIMER&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">gc_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">view&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">VIEW_STATE_SPEC&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">click&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">CLICK_STATE_SPEC&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MAX_TIMESTAMP&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">view&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">click&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_timestamp_seen&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;EventsPerLinkId&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadPerLinkEvents&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Join DoFn&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">JoinDoFn&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">JoinedEvent&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">View&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Click&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">joinDoFn&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">View&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// Store the view event.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">Click&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// Store the click event.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">MaxTimestampSeen&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Combining&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// The maximum element timestamp seen so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">GcTimer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span> &lt;span class="c1">// The timestamp of the timer.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">joinDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">JoinedEvent&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">valueState&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">View&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">isClick&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">valueState&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Click&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">valueState&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">event&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">view&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">View&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">click&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Click&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">view&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="nx">click&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">JoinedEvent&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">View&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">view&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">Click&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">click&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">clearState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampSeen&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">et&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Milliseconds&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">expTs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampSeen&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnixMilli&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">expTs&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Hour&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">joinDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">clearState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">joinDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">clearState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">View&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Click&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MaxTimestampSeen&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nf">AddJoinDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">joinDoFn&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">View&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;view&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Click&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">Event&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;click&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">MaxTimestampSeen&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeCombiningState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;maxTimestampSeen&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">a&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="kt">int64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">int64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">a&lt;/span> &lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nx">b&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">b&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">GcTimer&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InEventTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;gcTimer&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="batching-rpcs">11.5.2. Batching RPCs&lt;/h4>
&lt;p>In this example, input elements are being forwarded to an external RPC service. The RPC accepts batch requests -
multiple events for the same user can be batched in a single RPC call. Since this RPC service also imposes rate limits,
we want to batch ten seconds worth of events together in order to reduce the number of calls.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">perUser&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">readPerUser&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">perUser&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">OutputT&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Store the elements buffered so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">elements&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">bag&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Keep track of whether a timer is currently set or not.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;isTimerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">StateSpec&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">isTimerSet&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">StateSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">value&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The processing-time timer user to publish the RPC.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">TimerSpec&lt;/span> &lt;span class="n">timer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpecs&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">PROCESSING_TIME&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementsState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;isTimerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">isTimerSetState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@TimerId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">Timer&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Add the current element to the bag for this key.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">state&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">add&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(!&lt;/span>&lt;span class="n">MoreObjects&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">firstNonNull&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">isTimerSetState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// If there is no timer currently set, then set one to go off in 10 seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">offset&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">setRelative&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">isTimerSetState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@OnTimer&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;outputState&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">onTimer&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;state&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">BagState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">ValueT&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">elementsState&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@StateId&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;isTimerSet&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">ValueState&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">isTimerSetState&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Send an RPC containing the batched elements and clear state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">sendRPC&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">elementsState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">elementsState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">isTimerSetState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">clear&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">BufferDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">BUFFER&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BagStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;buffer&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">EventCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">IS_TIMER_SET&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ReadModifyWriteStateSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;is_timer_set&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">BooleanCoder&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OUTPUT&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">TimerSpec&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;output&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TimeDomain&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">REAL_TIME&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">is_timer_set&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">IS_TIMER_SET&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimerParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">OUTPUT&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="n">is_timer_set&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">timer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">Timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">now&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">is_timer_set&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kc">True&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@on_timer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">OUTPUT&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">output_callback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">BUFFER&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">is_timer_set&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">StateParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">IS_TIMER_SET&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">send_rpc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">list&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="p">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">buffer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">is_timer_set&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">clear&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">bufferDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Elements&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Bag&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// Store the elements buffered so far.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">IsTimerSet&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Value&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">bool&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="c1">// Keep track of whether a timer is currently set or not.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">OutputElements&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ProcessingTime&lt;/span> &lt;span class="c1">// The processing-time timer user to publish the RPC.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">bufferDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span> &lt;span class="nx">V&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Elements&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">value&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">isSet&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IsTimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">isSet&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputElements&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">tp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">().&lt;/span>&lt;span class="nf">Add&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IsTimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">bufferDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="nf">OnTimer&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">tp&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Provider&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">w&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Window&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">key&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">timer&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Context&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">switch&lt;/span> &lt;span class="nx">timer&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">case&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">OutputElements&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Family&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">elements&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">_&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Elements&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">sendRpc&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">elements&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Elements&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">IsTimerSet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Clear&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">sp&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="nx">AddBufferDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span> &lt;span class="nx">any&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="nx">s&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Scope&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">PCollection&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">bufferDoFn&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">]{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Elements&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeBagState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nx">V&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;elements&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">IsTimerSet&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">MakeValueState&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">bool&lt;/span>&lt;span class="p">](&lt;/span>&lt;span class="s">&amp;#34;isTimerSet&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">OutputElements&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">timers&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">InProcessingTime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;outputElements&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span> &lt;span class="nx">in&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="splittable-dofns">12. Splittable &lt;code>DoFns&lt;/code>&lt;/h2>
&lt;p>A Splittable &lt;code>DoFn&lt;/code> (SDF) enables users to create modular components containing I/Os (and some advanced
&lt;a href="https://s.apache.org/splittable-do-fn#heading=h.5cep9s8k4fxv">non I/O use cases&lt;/a>). Having modular
I/O components that can be connected to each other simplify typical patterns that users want.
For example, a popular use case is to read filenames from a message queue followed by parsing those
files. Traditionally, users were required to either write a single I/O connector that contained the
logic for the message queue and the file reader (increased complexity) or choose to reuse a message
queue I/O followed by a regular &lt;code>DoFn&lt;/code> that read the file (decreased performance). With SDF,
we bring the richness of Apache Beam’s I/O APIs to a &lt;code>DoFn&lt;/code> enabling modularity while maintaining the
performance of traditional I/O connectors.&lt;/p>
&lt;h3 id="sdf-basics">12.1. SDF basics&lt;/h3>
&lt;p>At a high level, an SDF is responsible for processing element and restriction pairs. A
restriction represents a subset of work that would have been necessary to have been done when
processing the element.&lt;/p>
&lt;p>Executing an SDF follows the following steps:&lt;/p>
&lt;ol>
&lt;li>Each element is paired with a restriction (e.g. filename is paired with offset range representing the whole file).&lt;/li>
&lt;li>Each element and restriction pair is split (e.g. offset ranges are broken up into smaller pieces).&lt;/li>
&lt;li>The runner redistributes the element and restriction pairs to several workers.&lt;/li>
&lt;li>Element and restriction pairs are processed in parallel (e.g. the file is read). Within this last step,
the element and restriction pair can pause its own processing and/or be split into further element and
restriction pairs.&lt;/li>
&lt;/ol>
&lt;p>&lt;img src="/images/sdf_high_level_overview.svg" alt="Diagram of steps that an SDF is composed of">&lt;/p>
&lt;h4 id="a-basic-sdf">12.1.1. A basic SDF&lt;/h4>
&lt;p>A basic SDF is composed of three parts: a restriction, a restriction provider, and a
restriction tracker. If you want to control the watermark, especially in a streaming
pipeline, two more components are needed: a watermark estimator provider and a watermark estimator.&lt;/p>
&lt;p>The restriction is a user-defined object that is used to represent a subset of
work for a given element. For example, we defined &lt;code>OffsetRange&lt;/code> as a restriction to represent offset
positions in &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html">Java&lt;/a>
and &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange">Python&lt;/a>.&lt;/p>
&lt;p>The restriction provider lets SDF authors override default implementations, including the ones for
splitting and sizing. In &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html">Java&lt;/a>
and &lt;a href="https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226">Go&lt;/a>,
this is the &lt;code>DoFn&lt;/code>. &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider">Python&lt;/a>
has a dedicated &lt;code>RestrictionProvider&lt;/code> type.&lt;/p>
&lt;p>The restriction tracker is responsible for tracking which subset of the restriction has been
completed during processing. For APIs details, read the &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html">Java&lt;/a>
and &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker">Python&lt;/a>
reference documentation.&lt;/p>
&lt;p>There are some built-in &lt;code>RestrictionTracker&lt;/code> implementations defined in Java:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html">OffsetRangeTracker&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html">GrowableOffsetRangeTracker&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html">ByteKeyRangeTracker&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>The SDF also has a built-in &lt;code>RestrictionTracker&lt;/code> implementation in Python:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker">OffsetRangeTracker&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>Go also has a built-in &lt;code>RestrictionTracker&lt;/code> type:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/rtrackers/offsetrange">OffsetRangeTracker&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>The watermark state is a user-defined object which is used to create a &lt;code>WatermarkEstimator&lt;/code> from a
&lt;code>WatermarkEstimatorProvider&lt;/code>. The simplest watermark state could be a &lt;code>timestamp&lt;/code>.&lt;/p>
&lt;p>The watermark estimator provider lets SDF authors define how to initialize the watermark state and
create a watermark estimator. In &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html">Java&lt;/a> and &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#ParDo">Go&lt;/a>
this is the &lt;code>DoFn&lt;/code>. &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider">Python&lt;/a>
has a dedicated &lt;code>WatermarkEstimatorProvider&lt;/code> type.&lt;/p>
&lt;p>The watermark estimator tracks the watermark when an element-restriction pair is in progress.
For APIs details, read the &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimator.html">Java&lt;/a>, &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.WatermarkEstimator">Python&lt;/a>, and &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf#WatermarkEstimator">Go&lt;/a>
reference documentation.&lt;/p>
&lt;p>There are some built-in &lt;code>WatermarkEstimator&lt;/code> implementations in Java:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.Manual.html">Manual&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.MonotonicallyIncreasing.html">MonotonicallyIncreasing&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.WallTime.html">WallTime&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>Along with the default &lt;code>WatermarkEstimatorProvider&lt;/code>, there are the same set of built-in
&lt;code>WatermarkEstimator&lt;/code> implementations in Python:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.ManualWatermarkEstimator">ManualWatermarkEstimator&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.MonotonicWatermarkEstimator">MonotonicWatermarkEstimator&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.WalltimeWatermarkEstimator">WalltimeWatermarkEstimator&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>The following &lt;code>WatermarkEstimator&lt;/code> types are implemented in Go:&lt;/p>
&lt;ol>
&lt;li>&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf#TimestampObservingWatermarkEstimator">TimestampObservingEstimator&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/core/sdf#WallTimeWatermarkEstimator">WalltimeWatermarkEstimator&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>To define an SDF, you must choose whether the SDF is bounded (default) or
unbounded and define a way to initialize an initial restriction for an element. The distinction is
based on how the amount of work is represented:&lt;/p>
&lt;ul>
&lt;li>Bounded DoFns are those where the work represented by an element is well-known beforehand and has
an end. Examples of bounded elements include a file or group of files.&lt;/li>
&lt;li>Unbounded DoFns are those where the amount of work does not have a specific end or the
amount of work is not known beforehand. Examples of unbounded elements include a Kafka or a PubSub
topic.&lt;/li>
&lt;/ul>
&lt;p>In Java, you can use &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/DoFn.UnboundedPerElement.html">@UnboundedPerElement&lt;/a>
or &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/DoFn.BoundedPerElement.html">@BoundedPerElement&lt;/a>
to annotate your &lt;code>DoFn&lt;/code>. In Python, you can use &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.unbounded_per_element">@unbounded_per_element&lt;/a>
to annotate the &lt;code>DoFn&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@BoundedPerElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">private&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">FileToWordsFn&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@GetInitialRestriction&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="nf">getInitialRestriction&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">fileName&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="kd">throws&lt;/span> &lt;span class="n">IOException&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">0&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">File&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">fileName&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">RestrictionTracker&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">throws&lt;/span> &lt;span class="n">IOException&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">RandomAccessFile&lt;/span> &lt;span class="n">file&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">RandomAccessFile&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;r&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">seekToNextRecordBoundaryInFile&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">currentRestriction&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getFrom&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">tryClaim&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFilePointer&lt;/span>&lt;span class="o">()))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">readNextRecord&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Providing the coder is only necessary if it can not be inferred at runtime.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@GetRestrictionCoder&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Coder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">getRestrictionCoder&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Coder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FileToWordsRestrictionProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">core&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionProvider&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">initial_restriction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">file_name&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">stat&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_name&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">st_size&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">create_tracker&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">restriction_trackers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">OffsetRestrictionTracker&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FileToWordsFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">file_name&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Alternatively, we can let FileToWordsFn itself inherit from&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># RestrictionProvider, implement the required methods and let&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># tracker=beam.DoFn.RestrictionParam() which will use self as&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># the provider.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">tracker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FileToWordsRestrictionProvider&lt;/span>&lt;span class="p">())):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_name&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">file_handle&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">file_handle&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">seek&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">current_restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">try_claim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_handle&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tell&lt;/span>&lt;span class="p">()):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">read_next_record&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_handle&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Providing the coder is only necessary if it can not be inferred at&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># runtime.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">restriction_coder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="o">...&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">CreateInitialRestriction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Start&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">End&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nf">getFileLength&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">CreateTracker&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rest&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewLockRTracker&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">NewTracker&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rest&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">filename&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">offset&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">seekToNextRecordBoundaryInFile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">Start&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TryClaim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">offset&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">record&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">newOffset&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">readNextRecord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">record&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">offset&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">newOffset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>At this point, we have an SDF that supports &lt;a href="#runner-initiated-split">runner-initiated splits&lt;/a>
enabling dynamic work rebalancing. To increase the rate at which initial parallelization of work occurs
or for those runners that do not support runner-initiated splitting, we recommend providing
a set of initial splits:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kt">void&lt;/span> &lt;span class="nf">splitRestriction&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Restriction&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">splitReceiver&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">long&lt;/span> &lt;span class="n">splitSize&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">64&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="n">20&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">long&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFrom&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">i&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTo&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">splitSize&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Compute and output 64 MiB size ranges to process in parallel
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">end&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">splitSize&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">splitReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">end&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">i&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">end&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output the last range
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">splitReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTo&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">FileToWordsRestrictionProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">core&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionProvider&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">split&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">file_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Compute and output 64 MiB size ranges to process in parallel&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">split_size&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">64&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">i&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">end&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">split_size&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">split_size&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">i&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">split_size&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">OffsetRange&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">end&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">SplitRestriction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rest&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">splits&lt;/span> &lt;span class="p">[]&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">size&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="mi">64&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">i&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">rest&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Start&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">i&lt;/span> &lt;span class="p">&amp;lt;&lt;/span> &lt;span class="nx">rest&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">End&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="nx">size&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Compute and output 64 MiB size ranges to process in parallel
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">end&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">i&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="nx">size&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">splits&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">splits&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">end&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">i&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">end&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Output the last range
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">splits&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nb">append&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">splits&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">i&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rest&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">End&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">splits&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="sizing-and-progress">12.2. Sizing and progress&lt;/h3>
&lt;p>Sizing and progress are used during execution of an SDF to inform runners so that they may
perform intelligent decisions about which restrictions to split and how to parallelize work.&lt;/p>
&lt;p>Before processing an element and restriction, an initial size may be used by a runner to choose
how and who processes the restrictions attempting to improve initial balancing and parallelization
of work. During the processing of an element and restriction, sizing and progress are used to choose
which restrictions to split and who should process them.&lt;/p>
&lt;p>By default, we use the restriction tracker’s estimate for work remaining falling back to assuming
that all restrictions have an equal cost. To override the default, SDF authors can provide the
appropriate method within the restriction provider. SDF authors need to be aware that the
sizing method will be invoked concurrently during bundle processing due to runner initiated splitting
and progress estimation.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@GetSize&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kt">double&lt;/span> &lt;span class="nf">getSize&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@Restriction&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">fileName&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">contains&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;expensiveRecords&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">?&lt;/span> &lt;span class="n">2&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">1&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTo&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">-&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFrom&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The RestrictionProvider is responsible for calculating the size of given&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># restriction.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyRestrictionProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">core&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionProvider&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">restriction_size&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">file_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">weight&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">2&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="s2">&amp;#34;expensiveRecords&amp;#34;&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">file_name&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">size&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">weight&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">RestrictionSize&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rest&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="kt">float64&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">weight&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nb">float64&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">strings&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Contains&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="err">“&lt;/span>&lt;span class="nx">expensiveRecords&lt;/span>&lt;span class="err">”&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">weight&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">weight&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">rest&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">End&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="nx">rest&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Start&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="user-initiated-checkpoint">12.3. User-initiated checkpoint&lt;/h3>
&lt;p>Some I/Os cannot produce all of the data necessary to complete a restriction within the lifetime of a
single bundle. This typically happens with unbounded restrictions, but can also happen with bounded
restrictions. For example, there could be more data that needs to be ingested but is not available yet.
Another cause of this scenario is the source system throttling your data.&lt;/p>
&lt;p>Your SDF can signal to you that you are not done processing the current restriction. This
signal can suggest a time to resume at. While the runner tries to honor the resume time, this is not
guaranteed. This allows execution to continue on a restriction that has available work improving
resource utilization.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="n">ProcessContinuation&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">RestrictionTracker&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">RecordPosition&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">long&lt;/span> &lt;span class="n">currentPosition&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">currentRestriction&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getFrom&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Service&lt;/span> &lt;span class="n">service&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">initializeService&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">try&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">RecordPosition&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">records&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">readNextRecords&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">currentPosition&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">records&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">isEmpty&lt;/span>&lt;span class="o">())&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return a short delay if there is no data to process at the moment.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">ProcessContinuation&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">resume&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">withResumeDelay&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">RecordPosition&lt;/span> &lt;span class="n">record&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">records&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(!&lt;/span>&lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">tryClaim&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getPosition&lt;/span>&lt;span class="o">()))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">ProcessContinuation&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">stop&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">currentPosition&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">record&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getPosition&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">1&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">catch&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">ThrottlingException&lt;/span> &lt;span class="n">exception&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return a longer delay in case we are being throttled.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">ProcessContinuation&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">resume&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">withResumeDelay&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">60&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MySplittableDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionParam&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyRestrictionProvider&lt;/span>&lt;span class="p">())):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">current_position&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">current_restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="kc">True&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Pull records from an external service.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">try&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">records&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">external_service&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fetch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">current_position&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">records&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">empty&lt;/span>&lt;span class="p">():&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set a shorter delay in case we are being throttled.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">defer_remainder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">second&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">record&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">records&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">try_claim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">record&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">position&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">current_position&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">record&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">position&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">record&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">except&lt;/span> &lt;span class="ne">TimeoutError&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Set a longer delay in case we are being throttled.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">defer_remainder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">timestamp&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">seconds&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">60&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">checkpointingSplittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">Record&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ProcessContinuation&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">error&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">position&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">Start&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">records&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ExternalService&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">readNextRecords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">position&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="nx">fn&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ExternalService&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">ThrottlingErr&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Resume at a later time to avoid throttling.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ResumeProcessingIn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">60&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">StopProcessing&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nb">len&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">records&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Wait for data to be available.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ResumeProcessingIn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Second&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">_&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">record&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="k">range&lt;/span> &lt;span class="nx">records&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TryClaim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">position&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Records have been claimed, finish processing.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">StopProcessing&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">position&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">record&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="runner-initiated-split">12.4. Runner-initiated split&lt;/h3>
&lt;p>A runner at any time may attempt to split a restriction while it is being processed. This allows the
runner to either pause processing of the restriction so that other work may be done (common for
unbounded restrictions to limit the amount of output and/or improve latency) or split the restriction
into two pieces, increasing the available parallelism within the system. Different runners (e.g.,
Dataflow, Flink, Spark) have different strategies to issue splits under batch and streaming
execution.&lt;/p>
&lt;p>Author an SDF with this in mind since the end of the restriction may change. When writing the
processing loop, use the result from trying to claim a piece of the restriction instead of assuming
you can process until the end.&lt;/p>
&lt;p>One incorrect example could be:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">badTryClaimLoop&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">RestrictionTracker&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">OutputReceiver&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">throws&lt;/span> &lt;span class="n">IOException&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">RandomAccessFile&lt;/span> &lt;span class="n">file&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">RandomAccessFile&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;r&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">seekToNextRecordBoundaryInFile&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">currentRestriction&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getFrom&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The restriction tracker can be modified by another thread in parallel
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// so storing state locally is ill advised.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">end&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">currentRestriction&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">getTo&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFilePointer&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">end&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Only after successfully claiming should we produce any output and/or
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// perform side effects.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">tryClaim&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getFilePointer&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">outputReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">readNextRecord&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">file&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">BadTryClaimLoop&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">file_name&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">tracker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FileToWordsRestrictionProvider&lt;/span>&lt;span class="p">())):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_name&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">file_handle&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">file_handle&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">seek&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">current_restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">start&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The restriction tracker can be modified by another thread in parallel&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># so storing state locally is ill advised.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">end&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">current_restriction&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">end&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="n">file_handle&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tell&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">end&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Only after successfully claiming should we produce any output and/or&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># perform side effects.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">tracker&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">try_claim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_handle&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tell&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">read_next_record&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">file_handle&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">badTryClaimLoop&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">filename&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">emit&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">os&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">filename&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">offset&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">seekToNextRecordBoundaryInFile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">Start&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="nx">err&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="kc">nil&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">err&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The restriction tracker can be modified by another thread in parallel
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// so storing state locally is ill advised.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">end&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">End&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="nx">offset&lt;/span> &lt;span class="p">&amp;lt;&lt;/span> &lt;span class="nx">end&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Only after successfully claiming should we produce any output and/or
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// perform side effects.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">TryClaim&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">offset&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">record&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">newOffset&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nf">readNextRecord&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nf">emit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">record&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">offset&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">newOffset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="watermark-estimation">12.5. Watermark estimation&lt;/h3>
&lt;p>The default watermark estimator does not produce a watermark estimate. Therefore, the output watermark
is solely computed by the minimum of upstream watermarks.&lt;/p>
&lt;p>An SDF can advance the output watermark by specifying a lower bound for all future output
that this element and restriction pair will produce. The runner computes the minimum output watermark
by taking the minimum over all upstream watermarks and the minimum reported by each element and
restriction pair. The reported watermark must monotonically increase for each element and restriction
pair across bundle boundaries. When an element and restriction pair stops processing its watermark,
it is no longer considered part of the above calculation.&lt;/p>
&lt;p>Tips:&lt;/p>
&lt;ul>
&lt;li>If you author an SDF that outputs records with timestamps, you should expose ways to allow users of
this SDF to configure which watermark estimator to use.&lt;/li>
&lt;li>Any data produced before the watermark may be considered late. See
&lt;a href="#watermarks-and-late-data">watermarks and late data&lt;/a> for more details.&lt;/li>
&lt;/ul>
&lt;h4 id="controlling-the-watermark">12.5.1. Controlling the watermark&lt;/h4>
&lt;p>There are two general types of watermark estimators: timestamp observing and external clock observing.
Timestamp observing watermark estimators use the output timestamp of each record to compute the watermark
estimate while external clock observing watermark estimators control the watermark by using a clock that
is not associated to any individual output, such as the local clock of the machine or a clock exposed
through an external service.&lt;/p>
&lt;p>The watermark estimator provider lets you override the default watermark estimation logic and use an existing
watermark estimator implementation. You can also provide your own watermark estimator implementation.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// (Optional) Define a custom watermark state type to save information between bundle
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// processing rounds.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyCustomWatermarkState&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">MyCustomWatermarkState&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Store data necessary for future watermark computations
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// (Optional) Choose which coder to use to encode the watermark estimator state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nd">@GetWatermarkEstimatorStateCoder&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Coder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MyCustomWatermarkState&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">getWatermarkEstimatorStateCoder&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">AvroCoder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyCustomWatermarkState&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Define a WatermarkEstimator
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">MyCustomWatermarkEstimator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">implements&lt;/span> &lt;span class="n">TimestampObservingWatermarkEstimator&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MyCustomWatermarkState&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="nf">MyCustomWatermarkEstimator&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyCustomWatermarkState&lt;/span> &lt;span class="n">type&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Initialize watermark estimator state
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">observeTimestamp&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Instant&lt;/span> &lt;span class="n">timestamp&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Will be invoked on each output from the SDF
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Instant&lt;/span> &lt;span class="nf">currentWatermark&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return a monotonically increasing value
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">currentWatermark&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">MyCustomWatermarkState&lt;/span> &lt;span class="nf">getState&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return state to resume future watermark estimation after a checkpoint/split
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="kc">null&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Then, update the DoFn to generate the initial watermark estimator state for all new element
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// and restriction pairs and to create a new instance given watermark estimator state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@GetInitialWatermarkEstimatorState&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">MyCustomWatermarkState&lt;/span> &lt;span class="nf">getInitialWatermarkEstimatorState&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@Restriction&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Compute and return the initial watermark estimator state for each element and
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// restriction. All subsequent processing of an element and restriction will be restored
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// from the existing state.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">MyCustomWatermarkState&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@NewWatermarkEstimator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">WatermarkEstimator&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">MyCustomWatermarkState&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">newWatermarkEstimator&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@WatermarkEstimatorState&lt;/span> &lt;span class="n">MyCustomWatermarkState&lt;/span> &lt;span class="n">oldState&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">MyCustomWatermarkEstimator&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">oldState&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># (Optional) Define a custom watermark state type to save information between&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># bundle processing rounds.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyCustomerWatermarkEstimatorState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">object&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Store data necessary for future watermark computations&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Define a WatermarkEstimator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyCustomWatermarkEstimator&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">WatermarkEstimator&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">estimator_state&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">state&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">estimator_state&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">observe_timestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">timestamp&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Will be invoked on each output from the SDF&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">current_watermark&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Return a monotonically increasing value&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">current_watermark&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">get_estimator_state&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Return state to resume future watermark estimation after a&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># checkpoint/split&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">state&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Then, a WatermarkEstimatorProvider needs to be created for this&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># WatermarkEstimator&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyWatermarkEstimatorProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">WatermarkEstimatorProvider&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">initial_estimator_state&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">MyCustomerWatermarkEstimatorState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">create_watermark_estimator&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">estimator_state&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">MyCustomWatermarkEstimator&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">estimator_state&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Finally, define the SDF using your estimator.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MySplittableDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">restriction_tracker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionParam&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MyRestrictionProvider&lt;/span>&lt;span class="p">()),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">watermark_estimator&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WatermarkEstimatorParam&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyWatermarkEstimatorProvider&lt;/span>&lt;span class="p">())):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># The current watermark can be inspected.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">watermark_estimator&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">current_watermark&lt;/span>&lt;span class="p">()&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// WatermarkState is a custom type.`
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">//
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// It is optional to write your own state type when making a custom estimator.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">WatermarkState&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Watermark&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CustomWatermarkEstimator is a custom watermark estimator.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// You may use any type here, including some of Beam&amp;#39;s built in watermark estimator types,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// e.g. sdf.WallTimeWatermarkEstimator, sdf.TimestampObservingWatermarkEstimator, and sdf.ManualWatermarkEstimator
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">type&lt;/span> &lt;span class="nx">CustomWatermarkEstimator&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">state&lt;/span> &lt;span class="nx">WatermarkState&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CurrentWatermark returns the current watermark and is invoked on DoFn splits and self-checkpoints.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Watermark estimators must implement CurrentWatermark() time.Time
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">e&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">CurrentWatermark&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Time&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Watermark&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ObserveTimestamp is called on the output timestamps of all
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// emitted elements to update the watermark. It is optional
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">e&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ObserveTimestamp&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ts&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Time&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Watermark&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">ts&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// InitialWatermarkEstimatorState defines an initial state used to initialize the watermark
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// estimator. It is optional. If this is not defined, WatermarkEstimatorState may not be
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// defined and CreateWatermarkEstimator must not take in parameters.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">weDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">InitialWatermarkEstimatorState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">et&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">EventTime&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rest&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">WatermarkState&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Return some watermark state
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="nx">WatermarkState&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Watermark&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">()}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// CreateWatermarkEstimator creates the watermark estimator used by this Splittable DoFn.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Must take in a state parameter if InitialWatermarkEstimatorState is defined, otherwise takes no parameters.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">weDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">CreateWatermarkEstimator&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">initialState&lt;/span> &lt;span class="nx">WatermarkState&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">state&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">initialState&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// WatermarkEstimatorState returns the state used to resume future watermark estimation
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// after a checkpoint/split. It is required if InitialWatermarkEstimatorState is defined,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// otherwise it must not be defined.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">weDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">WatermarkEstimatorState&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">e&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">WatermarkState&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">state&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ProcessElement is the method to execute for each element.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// It can optionally take in a watermark estimator.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">weDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">e&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">CustomWatermarkEstimator&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">e&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">state&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Watermark&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Now&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="truncating-during-drain">12.6. Truncating during drain&lt;/h3>
&lt;p>Runners which support draining pipelines need the ability to drain SDFs; otherwise, the
pipeline may never stop. By default, bounded restrictions process the remainder of the restriction while
unbounded restrictions finish processing at the next SDF-initiated checkpoint or runner-initiated split.
You are able to override this default behavior by defining the appropriate method on the restriction
provider.&lt;/p>
&lt;p class="language-go">Note: Once the pipeline drain starts and truncate restriction transform is triggered, the &lt;code>sdf.ProcessContinuation&lt;/code>
will not be rescheduled.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@TruncateRestriction&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@Nullable&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">TruncateResult&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">OffsetRange&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">truncateRestriction&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Element&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">fileName&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="nd">@Restriction&lt;/span> &lt;span class="n">OffsetRange&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">fileName&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">contains&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;optional&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Skip optional files
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="kc">null&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">TruncateResult&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">restriction&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MyRestrictionProvider&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transforms&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">core&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">RestrictionProvider&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">truncate&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">file_name&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="s2">&amp;#34;optional&amp;#34;&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">file_name&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Skip optional files&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">None&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">restriction&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// TruncateRestriction is a transform that is triggered when pipeline starts to drain. It helps to finish a
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// pipeline quicker by truncating the restriction.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">TruncateRestriction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">rt&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">start&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">Start&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">prevEnd&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">rt&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">GetRestriction&lt;/span>&lt;span class="p">().(&lt;/span>&lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">).&lt;/span>&lt;span class="nx">End&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// truncate the restriction by half.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">newEnd&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">prevEnd&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nx">offsetrange&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Restriction&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Start&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">start&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">End&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">newEnd&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="bundle-finalization">12.7. Bundle finalization&lt;/h3>
&lt;p>Bundle finalization enables a &lt;code>DoFn&lt;/code> to perform side effects by registering a callback.
The callback is invoked once the runner has acknowledged that it has durably persisted the output.
For example, a message queue might need to acknowledge messages that it has ingested into the pipeline.
Bundle finalization is not limited to SDFs but is called out here since this is the primary
use case.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">BundleFinalizer&lt;/span> &lt;span class="n">bundleFinalizer&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... produce output ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bundleFinalizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">afterBundleCommit&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Instant&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">now&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">plus&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">5&lt;/span>&lt;span class="o">)),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">()&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform a side effect ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">});&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MySplittableDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bundle_finalizer&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BundleFinalizerParam&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># ... produce output ...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Register callback function for this bundle that performs the side&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># effect.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bundle_finalizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">register&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_callback_func&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">func&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="nx">fn&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">splittableDoFn&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="nf">ProcessElement&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">bf&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">BundleFinalization&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">rt&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="nx">sdf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">LockRTracker&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">element&lt;/span> &lt;span class="kt">string&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... produce output ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">bf&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">RegisterCallback&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="nx">time&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">Minute&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kd">func&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="kt">error&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// ... perform a side effect ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="kc">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="multi-language-pipelines">13. Multi-language pipelines&lt;/h2>
&lt;p>This section provides comprehensive documentation of multi-language pipelines. To get started creating a multi-language pipeline, see:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/sdks/python-multi-language-pipelines">Python multi-language pipelines quickstart&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/sdks/java-multi-language-pipelines">Java multi-language pipelines quickstart&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Beam lets you combine transforms written in any supported SDK language (currently, Java and Python) and use them in one multi-language pipeline. This capability makes it easy to provide new functionality simultaneously in different Apache Beam SDKs through a single cross-language transform. For example, the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py">Apache Kafka connector&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py">SQL transform&lt;/a> from the Java SDK can be used in Python pipelines.&lt;/p>
&lt;p>Pipelines that use transforms from more than one SDK-language are known as &lt;em>multi-language pipelines&lt;/em>.&lt;/p>
&lt;p class="language-yaml">Beam YAML is built entirely on top of cross-language transforms.
In addition to the built in transforms, you can author your own transforms
(using the full expressivity of the Beam API) and surface them via a concept
called &lt;a href="https://beam.apache.org/documentation/sdks/yaml/#providers">providers&lt;/a>.&lt;/p>
&lt;h3 id="create-x-lang-transforms">13.1. Creating cross-language transforms&lt;/h3>
&lt;p>To make transforms written in one language available to pipelines written in another language, Beam uses an &lt;em>expansion service&lt;/em>, which creates and injects the appropriate language-specific pipeline fragments into the pipeline.&lt;/p>
&lt;p>In the following example, a Beam Python pipeline starts up a local Java expansion service to create and inject the appropriate Java pipeline fragments for executing the Java Kafka cross-language transform into the Python pipeline. The SDK then downloads and stages the necessary Java dependencies needed to execute these transforms.&lt;/p>
&lt;p>&lt;img src="/images/multi-language-pipelines-diagram.svg" alt="Diagram of multi-language pipeline execution flow.">&lt;/p>
&lt;p>At runtime, the Beam runner will execute both Python and Java transforms to run the pipeline.&lt;/p>
&lt;p>In this section, we will use &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html">KafkaIO.Read&lt;/a> to illustrate how to create a cross-language transform for Java and a test example for Python.&lt;/p>
&lt;h4 id="1311-creating-cross-language-java-transforms">13.1.1. Creating cross-language Java transforms&lt;/h4>
&lt;p>There are two ways to make Java transforms available to other SDKs.&lt;/p>
&lt;ul>
&lt;li>Option 1: In some cases, you can use existing Java transforms from other SDKs without writing any additional Java code.&lt;/li>
&lt;li>Option 2: You can use arbitrary Java transforms from other SDKs by adding a few Java classes.&lt;/li>
&lt;/ul>
&lt;h5 id="13111-using-existing-java-transforms-without-writing-more-java-code">13.1.1.1 Using existing Java transforms without writing more Java code&lt;/h5>
&lt;p>Starting with Beam 2.34.0, Python SDK users can use some Java transforms without writing additional Java code. This can be useful in many cases. For example:&lt;/p>
&lt;ul>
&lt;li>A developer not familiar with Java may need to use an existing Java transform from a Python pipeline.&lt;/li>
&lt;li>A developer may need to make an existing Java transform available to a Python pipeline without writing/releasing more Java code.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> This feature is currently only available when using Java transforms from a Python pipeline.&lt;/p>
&lt;/blockquote>
&lt;p>To be eligible for direct usage, the API of the Java transform has to meet the following requirements:&lt;/p>
&lt;ol>
&lt;li>The Java transform can be constructed using an available public constructor or a public static method (a constructor method) in the same Java class.&lt;/li>
&lt;li>The Java transform can be configured using one or more builder methods. Each builder method should be public and should return an instance of the Java transform.&lt;/li>
&lt;/ol>
&lt;p>Here&amp;rsquo;s an example Java class that can be directly used from the Python API.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">JavaDataGenerator&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PBegin&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The following method satisfies requirement 1.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Note that you could use a class constructor instead of a static method.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="n">JavaDataGenerator&lt;/span> &lt;span class="nf">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Integer&lt;/span> &lt;span class="n">size&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">JavaDataGenerator&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">size&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">JavaDataGeneratorConfig&lt;/span> &lt;span class="kd">implements&lt;/span> &lt;span class="n">Serializable&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">prefix&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">long&lt;/span> &lt;span class="n">length&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">suffix&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The following method conforms to requirement 2.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kd">public&lt;/span> &lt;span class="n">JavaDataGenerator&lt;/span> &lt;span class="nf">withJavaDataGeneratorConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JavaDataGeneratorConfig&lt;/span> &lt;span class="n">dataConfig&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">JavaDataGenerator&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">size&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">javaDataGeneratorConfig&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span> &lt;span class="o">.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For a complete example, see &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java">JavaDataGenerator&lt;/a>.&lt;/p>
&lt;p>To use a Java class that conforms to the above requirements from a Python SDK pipeline, follow these steps:&lt;/p>
&lt;ol>
&lt;li>Create a &lt;em>yaml&lt;/em> allowlist that describes the Java transform classes and methods that will be directly accessed from Python.&lt;/li>
&lt;li>Start an expansion service, using the &lt;code>javaClassLookupAllowlistFile&lt;/code> option to pass the path to the allowlist.&lt;/li>
&lt;li>Use the Python &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py">JavaExternalTransform&lt;/a> API to directly access Java transforms defined in the allowlist from the Python side.&lt;/li>
&lt;/ol>
&lt;p>Starting with Beam 2.36.0, steps 1 and 2 can be skipped, as described in the corresponding sections below.&lt;/p>
&lt;p>&lt;strong>Step 1&lt;/strong>&lt;/p>
&lt;p>To use an eligible Java transform from Python, define a &lt;em>yaml&lt;/em> allowlist. This allowlist lists the class names,
constructor methods, and builder methods that are directly available to be used from the Python side.&lt;/p>
&lt;p>Starting with Beam 2.35.0, you have the option to pass &lt;code>*&lt;/code> to the &lt;code>javaClassLookupAllowlistFile&lt;/code> option instead of defining an actual allowlist. The &lt;code>*&lt;/code> specifies that all supported transforms in the classpath of the expansion service can be accessed through the API. We encourage using an actual allowlist for production, because allowing clients to access arbitrary Java classes can pose a security risk.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>version: v1
allowedClasses:
- className: my.beam.transforms.JavaDataGenerator
allowedConstructorMethods:
- create
allowedBuilderMethods:
- withJavaDataGeneratorConfig&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Step 2&lt;/strong>&lt;/p>
&lt;p>Provide the allowlist as an argument when starting up the Java expansion service. For example, you can start the expansion service
as a local Java process using the following command:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>java -jar &amp;lt;jar file&amp;gt; &amp;lt;port&amp;gt; --javaClassLookupAllowlistFile=&amp;lt;path to the allowlist file&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Starting with Beam 2.36.0, the &lt;code>JavaExternalTransform&lt;/code> API will automatically start up an expansion service with a given &lt;code>jar&lt;/code> file dependency if an expansion service address was not provided.&lt;/p>
&lt;p>&lt;strong>Step 3&lt;/strong>&lt;/p>
&lt;p>You can use the Java class directly from your Python pipeline using a stub transform created from the &lt;code>JavaExternalTransform&lt;/code> API. This API allows you to construct the transform using the Java class name and allows you to invoke builder methods to configure the class.&lt;/p>
&lt;p>Constructor and method parameter types are mapped between Python and Java using a Beam schema. The schema is auto-generated using the object types
provided on the Python side. If the Java class constructor method or builder method accepts any complex object types, make sure that the Beam schema
for these objects is registered and available for the Java expansion service. If a schema has not been registered, the Java expansion service will
try to register a schema using &lt;a href="/documentation/programming-guide/#creating-schemas">JavaFieldSchema&lt;/a>. In Python, arbitrary objects
can be represented using &lt;code>NamedTuple&lt;/code>s, which will be represented as Beam rows in the schema. Here is a Python stub transform that represents the above
mentioned Java transform:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">JavaDataGeneratorConfig&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s1">&amp;#39;JavaDataGeneratorConfig&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">[(&lt;/span>&lt;span class="s1">&amp;#39;prefix&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;length&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;suffix&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">)])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">data_config&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">JavaDataGeneratorConfig&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">prefix&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;start&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">length&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">suffix&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;end&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">java_transform&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">JavaExternalTransform&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s1">&amp;#39;my.beam.transforms.JavaDataGenerator&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">expansion_service&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;localhost:&amp;lt;port&amp;gt;&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numpy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int32&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">100&lt;/span>&lt;span class="p">))&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">withJavaDataGeneratorConfig&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data_config&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You can use this transform in a Python pipeline along with other Python transforms. For a complete example, see &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/python/javadatagenerator.py">javadatagenerator.py&lt;/a>.&lt;/p>
&lt;h5 id="13112-using-the-api-to-make-existing-java-transforms-available-to-other-sdks">13.1.1.2 Using the API to make existing Java transforms available to other SDKs&lt;/h5>
&lt;p>To make your Beam Java SDK transform portable across SDK languages, you must implement two interfaces: &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ExternalTransformBuilder.java">ExternalTransformBuilder&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/expansion/ExternalTransformRegistrar.java">ExternalTransformRegistrar&lt;/a>. The &lt;code>ExternalTransformBuilder&lt;/code> interface constructs the cross-language transform using configuration values passed in from the pipeline, and the &lt;code>ExternalTransformRegistrar&lt;/code> interface registers the cross-language transform for use with the expansion service.&lt;/p>
&lt;p>&lt;strong>Implementing the interfaces&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Define a Builder class for your transform that implements the &lt;code>ExternalTransformBuilder&lt;/code> interface and overrides the &lt;code>buildExternal&lt;/code> method that will be used to build your transform object. Initial configuration values for your transform should be defined in the &lt;code>buildExternal&lt;/code> method. In most cases, it&amp;rsquo;s convenient to make the Java transform builder class implement &lt;code>ExternalTransformBuilder&lt;/code>.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> &lt;code>ExternalTransformBuilder&lt;/code> requires you to define a configuration object (a simple POJO) to capture a set of parameters sent by external SDKs to initiate the Java transform. Usually these parameters directly map to constructor parameters of the Java transform.&lt;/p>
&lt;/blockquote>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@AutoValue.Builder&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">abstract&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">Builder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">implements&lt;/span> &lt;span class="n">ExternalTransformBuilder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">External&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Configuration&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PBegin&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">abstract&lt;/span> &lt;span class="n">Builder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">setConsumerConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">config&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">abstract&lt;/span> &lt;span class="n">Builder&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">setTopics&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">topics&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Remaining property declarations omitted for clarity. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">abstract&lt;/span> &lt;span class="n">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">PTransform&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">PBegin&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">V&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="nf">buildExternal&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">External&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Configuration&lt;/span> &lt;span class="n">config&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">setTopics&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ImmutableList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">copyOf&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">config&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">topics&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Remaining property defaults omitted for clarity. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For complete examples, see &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaCountBuilder.java">JavaCountBuilder&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaPrefixBuilder.java">JavaPrefixBuilder&lt;/a>.&lt;/p>
&lt;p>Note that the &lt;code>buildExternal&lt;/code> method can perform additional operations before setting properties received from external SDKs in the transform. For example, &lt;code>buildExternal&lt;/code> can validate properties available in the configuration object before setting them in the transform.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Register the transform as an external cross-language transform by defining a class that implements &lt;code>ExternalTransformRegistrar&lt;/code>. You must annotate your class with the &lt;code>AutoService&lt;/code> annotation to ensure that your transform is registered and instantiated properly by the expansion service.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In your registrar class, define a Uniform Resource Name (URN) for your transform. The URN must be a unique string that identifies your transform with the expansion service.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>From within your registrar class, define a configuration class for the parameters used during the initialization of your transform by the external SDK.&lt;/p>
&lt;p>The following example from the KafkaIO transform shows how to implement steps two through four:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@AutoService&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ExternalTransformRegistrar&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">External&lt;/span> &lt;span class="kd">implements&lt;/span> &lt;span class="n">ExternalTransformRegistrar&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">URN&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s">&amp;#34;beam:external:java:kafka:read:v1&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Class&lt;/span>&lt;span class="o">&amp;lt;?&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">ExternalTransformBuilder&lt;/span>&lt;span class="o">&amp;lt;?,&lt;/span> &lt;span class="o">?,&lt;/span> &lt;span class="o">?&amp;gt;&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">knownBuilders&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">ImmutableMap&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">URN&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">Class&lt;/span>&lt;span class="o">&amp;lt;?&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">ExternalTransformBuilder&lt;/span>&lt;span class="o">&amp;lt;?,&lt;/span> &lt;span class="o">?,&lt;/span> &lt;span class="o">?&amp;gt;&amp;gt;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">(&lt;/span>&lt;span class="n">Class&lt;/span>&lt;span class="o">&amp;lt;?&amp;gt;)&lt;/span> &lt;span class="n">AutoValue_KafkaIO_Read&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Builder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Parameters class to expose the Read transform to an external SDK. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">Configuration&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">consumerConfig&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">topics&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setConsumerConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">consumerConfig&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">consumerConfig&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">consumerConfig&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">setTopics&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">topics&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">topics&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">topics&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Remaining properties omitted for clarity. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For additional examples, see &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaCountRegistrar.java">JavaCountRegistrar&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaPrefixRegistrar.java">JavaPrefixRegistrar&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>After you have implemented the &lt;code>ExternalTransformBuilder&lt;/code> and &lt;code>ExternalTransformRegistrar&lt;/code> interfaces, your transform can be registered and created successfully by the default Java expansion service.&lt;/p>
&lt;p>&lt;strong>Starting the expansion service&lt;/strong>&lt;/p>
&lt;p>You can use an expansion service with multiple transforms in the same pipeline. The Beam Java SDK provides a default expansion service for Java transforms. You can also write your own expansion service, but that&amp;rsquo;s generally not needed, so it&amp;rsquo;s not covered in this section.&lt;/p>
&lt;p>Perform the following to start up a Java expansion service directly:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code># Build a JAR with both your transform and the expansion service
# Start the expansion service at the specified port.
$ jar -jar /path/to/expansion_service.jar &amp;lt;PORT_NUMBER&amp;gt;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>The expansion service is now ready to serve transforms on the specified port.&lt;/p>
&lt;p>When creating SDK-specific wrappers for your transform, you may be able to use SDK-provided utilities to start up an expansion service. For example, the Python SDK provides the utilities &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.JavaJarExpansionService">&lt;code>JavaJarExpansionService&lt;/code>&lt;/a> and &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.BeamJarExpansionService">&lt;code>BeamJarExpansionService&lt;/code>&lt;/a> for starting up a Java expansion service using a JAR file.&lt;/p>
&lt;p>&lt;strong>Including dependencies&lt;/strong>&lt;/p>
&lt;p>If your transform requires external libraries, you can include them by adding them to the classpath of the expansion service. After they are included in the classpath, they will be staged when your transform is expanded by the expansion service.&lt;/p>
&lt;p>&lt;strong>Writing SDK-specific wrappers&lt;/strong>&lt;/p>
&lt;p>Your cross-language Java transform can be called through the lower-level &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.ExternalTransform">&lt;code>ExternalTransform&lt;/code>&lt;/a> class in a multi-language pipeline (as described in the next section); however, if possible, you should write an SDK-specific wrapper in the language of the pipeline (such as Python) to access the transform instead. This higher-level abstraction will make it easier for pipeline authors to use your transform.&lt;/p>
&lt;p>To create an SDK wrapper for use in a Python pipeline, do the following:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Create a Python module for your cross-language transform(s).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In the module, use one of the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.PayloadBuilder">&lt;code>PayloadBuilder&lt;/code>&lt;/a> classes to build the payload for the initial cross-language transform expansion request.&lt;/p>
&lt;p>The parameter names and types of the payload should map to parameter names and types of the configuration POJO provided to the Java &lt;code>ExternalTransformBuilder&lt;/code>. Parameter types are mapped across SDKs using a &lt;a href="https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/schema.proto">Beam schema&lt;/a>. Parameter names are mapped by simply converting Python underscore-separated variable names to camel-case (Java standard).&lt;/p>
&lt;p>In the following example, &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py">kafka.py&lt;/a> uses &lt;code>NamedTupleBasedPayloadBuilder&lt;/code> to build the payload. The parameters map to the Java &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java">KafkaIO.External.Configuration&lt;/a> config object defined in the previous section.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ReadFromKafkaSchema&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">NamedTuple&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">consumer_config&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Mapping&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">topics&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">List&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Other properties omitted for clarity.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">payload&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">NamedTupleBasedPayloadBuilder&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ReadFromKafkaSchema&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="o">...&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Start an expansion service, unless one is specified by the pipeline creator. The Beam Python SDK provides the utilities &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.JavaJarExpansionService">&lt;code>JavaJarExpansionService&lt;/code>&lt;/a> and &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.BeamJarExpansionService">&lt;code>BeamJarExpansionService&lt;/code>&lt;/a> for starting up an expansion service using a JAR file. &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.JavaJarExpansionService">&lt;code>JavaJarExpansionService&lt;/code>&lt;/a> can be used to start up an expansion service using the path (a local path or a URL) to a given JAR file. &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.BeamJarExpansionService">&lt;code>BeamJarExpansionService&lt;/code>&lt;/a> can be used to start an expansion service from a JAR released with Beam.&lt;/p>
&lt;p>For transforms released with Beam, do the following:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Add a Gradle target to Beam that can be used to build a shaded expansion service JAR for the target Java transform. This target should produce a Beam JAR that contains all dependencies needed for expanding the Java transform, and the JAR should be released with Beam. You might be able to use an existing Gradle target that offers an aggregated version of an expansion service JAR (for example, for all GCP IO).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In your Python module, instantiate &lt;code>BeamJarExpansionService&lt;/code> with the Gradle target.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">expansion_service&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BeamJarExpansionService&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;sdks:java:io:expansion-service:shadowJar&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Add a Python wrapper transform class that extends &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.ExternalTransform">&lt;code>ExternalTransform&lt;/code>&lt;/a>. Pass the payload and expansion service defined above as parameters to the constructor of the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.ExternalTransform">&lt;code>ExternalTransform&lt;/code>&lt;/a> parent class.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h4 id="1312-creating-cross-language-python-transforms">13.1.2. Creating cross-language Python transforms&lt;/h4>
&lt;p>Any Python transforms defined in the scope of the expansion service should be accessible by specifying their fully qualified names. For example, you could use Python&amp;rsquo;s &lt;code>ReadFromText&lt;/code> transform in a Java pipeline with its fully qualified name &lt;code>apache_beam.io.ReadFromText&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PythonExternalTransform&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">PBegin&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>&lt;span class="n">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;apache_beam.io.ReadFromText&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKwarg&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;file_pattern&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">options&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getInputFile&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKwarg&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;validate&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kc">false&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>PythonExternalTransform&lt;/code> has other useful methods such as &lt;code>withExtraPackages&lt;/code> for staging PyPI package dependencies and &lt;code>withOutputCoder&lt;/code> for setting an output coder. If your transform exists in an external package, make sure to specify that package using &lt;code>withExtraPackages&lt;/code>, for example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PythonExternalTransform&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">PBegin&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>&lt;span class="n">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;my_python_package.BeamReadPTransform&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withExtraPackages&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ImmutableList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;my_python_package&amp;#34;&lt;/span>&lt;span class="o">)))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Alternatively, you may want to create a Python module that registers an existing Python transform as a cross-language transform for use with the Python expansion service and calls into that existing transform to perform its intended operation. A registered URN can be used later in an expansion request for indicating an expansion target.&lt;/p>
&lt;p>&lt;strong>Defining the Python module&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Define a Uniform Resource Name (URN) for your transform. The URN must be a unique string that identifies your transform with the expansion service.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">TEST_COMPK_URN&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;beam:transforms:xlang:test:compk&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>For an existing Python transform, create a new class to register the URN with the Python expansion service.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@ptransform.PTransform.register_urn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">TEST_COMPK_URN&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">CombinePerKeyTransform&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ptransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PTransform&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>From within the class, define an expand method that takes an input PCollection, runs the Python transform, and then returns the output PCollection.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">pcoll&lt;/span> \
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CombinePerKey&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">sum&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_output_types&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">typing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Tuple&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">unicode&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">int&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>As with other Python transforms, define a &lt;code>to_runner_api_parameter&lt;/code> method that returns the URN.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">to_runner_api_parameter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unused_context&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">TEST_COMPK_URN&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kc">None&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Define a static &lt;code>from_runner_api_parameter&lt;/code> method that returns an instantiation of the cross-language Python transform.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@staticmethod&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">from_runner_api_parameter&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">unused_ptransform&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unused_parameter&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unused_context&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">CombinePerKeyTransform&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Starting the expansion service&lt;/strong>&lt;/p>
&lt;p>An expansion service can be used with multiple transforms in the same pipeline. The Beam Python SDK provides a default expansion service for you to use with your Python transforms. You are free to write your own expansion service, but that is generally not needed, so it is not covered in this section.&lt;/p>
&lt;p>Perform the following steps to start up the default Python expansion service directly:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Create a virtual environment and &lt;a href="/get-started/quickstart-py/">install the Apache Beam SDK&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Start the Python SDK’s expansion service with a specified port.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>$ export PORT_FOR_EXPANSION_SERVICE=12345
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;li>
&lt;p>Import any modules that contain transforms to be made available using the expansion service.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>$ python -m apache_beam.runners.portability.expansion_service_test -p $PORT_FOR_EXPANSION_SERVICE --pickle_library=cloudpickle
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/li>
&lt;li>
&lt;p>This expansion service is now ready to serve up transforms on the address `localhost:$PORT_FOR_EXPANSION_SERVICE&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h4 id="1313-creating-cross-language-go-transforms">13.1.3. Creating cross-language Go transforms&lt;/h4>
&lt;p>Go currently does not support creating cross-language transforms, only using cross-language
transforms from other languages; see more at &lt;a href="https://github.com/apache/beam/issues/21767">Issue 21767&lt;/a>.&lt;/p>
&lt;h4 id="1314-defining-a-urn">13.1.4. Defining a URN&lt;/h4>
&lt;p>Developing a cross-language transform involves defining a URN for registering the transform with an expansion service. In this section
we provide a convention for defining such URNs. Following this convention is optional but it will ensure that your transform
will not run into conflicts when registering in an expansion service along with transforms developed by other developers.&lt;/p>
&lt;h5 id="13141-schema">13.1.4.1. Schema&lt;/h5>
&lt;p>A URN should consist of the following components:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>ns-id&lt;/strong>: A namespace identifier. Default recommendation is &lt;code>beam:transform&lt;/code>.&lt;/li>
&lt;li>&lt;strong>org-identifier&lt;/strong>: Identifies the organization where the transform was defined. Transforms defined in Apache Beam use &lt;code>org.apache.beam&lt;/code> for this.&lt;/li>
&lt;li>&lt;strong>functionality-identifier&lt;/strong>: Identifies the functionality of the cross-language transform.&lt;/li>
&lt;li>&lt;strong>version&lt;/strong>: a version number for the transform.&lt;/li>
&lt;/ul>
&lt;p>We provide the schema from the URN convention in &lt;a href="https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_form">augmented Backus–Naur&lt;/a> form.
Keywords in upper case are from the &lt;a href="https://datatracker.ietf.org/doc/html/rfc8141">URN spec&lt;/a>.&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>transform-urn = ns-id “:” org-identifier “:” functionality-identifier “:” version
ns-id = (“beam” / NID) “:” “transform”
id-char = ALPHA / DIGIT / &amp;#34;-&amp;#34; / &amp;#34;.&amp;#34; / &amp;#34;_&amp;#34; / &amp;#34;~&amp;#34; ; A subset of characters allowed in a URN
org-identifier = 1*id-char
functionality-identifier = 1*id-char
version = “v” 1*(DIGIT / “.”) ; For example, ‘v1.2’&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h5 id="13142-examples">13.1.4.2. Examples&lt;/h5>
&lt;p>Below we’ve given some example transform classes and corresponding URNs to be used.&lt;/p>
&lt;ul>
&lt;li>A transform offered with Apache Beam that writes Parquet files.
&lt;ul>
&lt;li>&lt;code>beam:transform:org.apache.beam:parquet_write:v1&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>A transform offered with Apache Beam that reads from Kafka with metadata.
&lt;ul>
&lt;li>&lt;code>beam:transform:org.apache.beam:kafka_read_with_metadata:v1&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>A transform developed by organization abc.org that reads from data store MyDatastore.
&lt;ul>
&lt;li>&lt;code>beam:transform:org.abc:mydatastore_read:v1&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="use-x-lang-transforms">13.2. Using cross-language transforms&lt;/h3>
&lt;p>Depending on the SDK language of the pipeline, you can use a high-level SDK-wrapper class, or a low-level transform class to access a cross-language transform.&lt;/p>
&lt;h4 id="1321-using-cross-language-transforms-in-a-java-pipeline">13.2.1. Using cross-language transforms in a Java pipeline&lt;/h4>
&lt;p>Users have three options to use cross-language transforms in a Java pipeline. At the highest level of abstraction, some popular Python transforms are accessible through dedicated Java wrapper transforms. For example, the Java SDK has the &lt;code>DataframeTransform&lt;/code> class, which uses the Python SDK&amp;rsquo;s &lt;code>DataframeTransform&lt;/code>, and it has the &lt;code>RunInference&lt;/code> class, which uses the Python SDK&amp;rsquo;s &lt;code>RunInference&lt;/code>, and so on. When an SDK-specific wrapper transform is not available for a target Python transform, you can use the lower-level &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java">PythonExternalTransform&lt;/a> class instead by specifying the fully qualified name of the Python transform. If you want to try external transforms from SDKs other than Python (including Java SDK itself), you can also use the lowest-level &lt;a href="https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java">External&lt;/a> class.&lt;/p>
&lt;p>&lt;strong>Using an SDK wrapper&lt;/strong>&lt;/p>
&lt;p>To use a cross-language transform through an SDK wrapper, import the module for the SDK wrapper and call it from your pipeline, as shown in the example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">org.apache.beam.sdk.extensions.python.transforms.DataframeTransform&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">input&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">DataframeTransform&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;lambda df: df.groupby(&amp;#39;a&amp;#39;).sum()&amp;#34;&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withIndexes&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Using the PythonExternalTransform class&lt;/strong>&lt;/p>
&lt;p>When an SDK-specific wrapper is not available, you can access the Python cross-language transform through the &lt;code>PythonExternalTransform&lt;/code> class by specifying the fully qualified name and the constructor arguments of the target Python transform.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">input&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PythonExternalTransform&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>&lt;span class="n">from&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;apache_beam.dataframe.transforms.DataframeTransform&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKwarg&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;func&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">PythonCallableSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;lambda df: df.groupby(&amp;#39;a&amp;#39;).sum()&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKwarg&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;include_indexes&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="kc">true&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Using the External class&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Make sure you have any runtime environment dependencies (like the JRE) installed on your local machine (either directly on the local machine or available through a container). See the expansion service section for more details.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> When including Python transforms from within a Java pipeline, all Python dependencies have to be included in the SDK harness container.&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>Start up the expansion service for the SDK that is in the language of the transform you&amp;rsquo;re trying to consume, if not available.&lt;/p>
&lt;p>Make sure the transform you are trying to use is available and can be used by the expansion service.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Include &lt;a href="https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java">External.of(&amp;hellip;)&lt;/a> when instantiating your pipeline. Reference the URN, payload, and expansion service. For examples, see the &lt;a href="https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java">cross-language transform test suite&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>After the job has been submitted to the Beam runner, shutdown the expansion service by terminating the expansion service process.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h4 id="1322-using-cross-language-transforms-in-a-python-pipeline">13.2.2. Using cross-language transforms in a Python pipeline&lt;/h4>
&lt;p>If a Python-specific wrapper for a cross-language transform is available, use that. Otherwise, you have to use the lower-level &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.ExternalTransform">&lt;code>ExternalTransform&lt;/code>&lt;/a> class to access the transform.&lt;/p>
&lt;p>&lt;strong>Using an SDK wrapper&lt;/strong>&lt;/p>
&lt;p>To use a cross-language transform through an SDK wrapper, import the module for the SDK wrapper and call it from your pipeline, as shown in the example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.io.kafka&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">ReadFromKafka&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">kafka_records&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;ReadFromKafka&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">ReadFromKafka&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">consumer_config&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;bootstrap.servers&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">bootstrap_servers&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;auto.offset.reset&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;earliest&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">topics&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">topic&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">max_num_records&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">max_num_records&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">expansion_service&lt;/span>&lt;span class="o">=&amp;lt;&lt;/span>&lt;span class="n">Address&lt;/span> &lt;span class="n">of&lt;/span> &lt;span class="n">expansion&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Using the ExternalTransform class&lt;/strong>&lt;/p>
&lt;p>When an SDK-specific wrapper isn&amp;rsquo;t available, you will have to access the cross-language transform through the &lt;code>ExternalTransform&lt;/code> class.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Make sure you have any runtime environment dependencies (like the JRE) installed on your local machine. See the expansion service section for more details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Start up the expansion service for the SDK that is in the language of the transform you&amp;rsquo;re trying to consume, if not available.
Python provides several classes for automatically starting expansion java services such as
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.JavaJarExpansionService">JavaJarExpansionService&lt;/a>
and &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.BeamJarExpansionService">BeamJarExpansionService&lt;/a>
which can be passed directly as an expansion service to &lt;code>beam.ExternalTransform&lt;/code>.
Make sure the transform you&amp;rsquo;re trying to use is available and can be used by the expansion service.&lt;/p>
&lt;p>For Java, make sure the builder and registrar for the transform are available in the classpath of the expansion service.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Include &lt;code>ExternalTransform&lt;/code> when instantiating your pipeline. Reference the URN, payload, and expansion service.
You can use one of the available &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.PayloadBuilder">&lt;code>PayloadBuilder&lt;/code>&lt;/a> classes to build the payload for &lt;code>ExternalTransform&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">res&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s1">&amp;#39;a&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;b&amp;#39;&lt;/span>&lt;span class="p">])&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_output_types&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">unicode&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ExternalTransform&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TEST_PREFIX_URN&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ImplicitSchemaPayloadBuilder&lt;/span>&lt;span class="p">({&lt;/span>&lt;span class="s1">&amp;#39;data&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;0&amp;#39;&lt;/span>&lt;span class="p">}),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">expansion&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert_that&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">res&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">equal_to&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="s1">&amp;#39;0a&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;0b&amp;#39;&lt;/span>&lt;span class="p">]))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For additional examples, see &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/python/addprefix.py">addprefix.py&lt;/a> and &lt;a href="https://github.com/apache/beam/blob/master/examples/multi-language/python/javacount.py">javacount.py&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>After the job has been submitted to the Beam runner, shut down any manually started expansion services by terminating the expansion service process.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Using the JavaExternalTransform class&lt;/strong>&lt;/p>
&lt;p>Python has the ability to invoke Java-defined transforms via &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.external.html#apache_beam.transforms.external.JavaExternalTransform">proxy objects&lt;/a>
as if they were Python transforms.
These are invoked as follows&lt;/p>
&lt;pre>&lt;code>```py
MyJavaTransform = beam.JavaExternalTransform('fully.qualified.ClassName', classpath=[jars])
with pipeline as p:
res = (
p
| beam.Create(['a', 'b']).with_output_types(unicode)
| MyJavaTransform(javaConstructorArg, ...).builderMethod(...)
assert_that(res, equal_to(['0a', '0b']))
```
&lt;/code>&lt;/pre>
&lt;p>Python&amp;rsquo;s &lt;code>getattr&lt;/code> method can be used if the method names in java are reserved
Python keywords such as &lt;code>from&lt;/code>.&lt;/p>
&lt;p>As with other external transforms, either a pre-started expansion service can
be provided, or jar files that include the transform, its dependencies, and
Beam&amp;rsquo;s expansion service in which case an expansion service will be auto-started.&lt;/p>
&lt;h4 id="1323-using-cross-language-transforms-in-a-go-pipeline">13.2.3. Using cross-language transforms in a Go pipeline&lt;/h4>
&lt;p>If a Go-specific wrapper for a cross-language is available, use that. Otherwise, you have to use the
lower-level &lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#CrossLanguage">CrossLanguage&lt;/a>
function to access the transform.&lt;/p>
&lt;p>&lt;strong>Expansion Services&lt;/strong>&lt;/p>
&lt;p>The Go SDK supports automatically starting Java expansion services if an expansion address is not provided, although this is slower than
providing a persistent expansion service. Many wrapped Java transforms manage perform this automatically; if you wish to do this manually, use the &lt;code>xlangx&lt;/code> package&amp;rsquo;s
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.40.0/go/pkg/beam/core/runtime/xlangx#UseAutomatedJavaExpansionService">UseAutomatedJavaExpansionService()&lt;/a> function. In order to use Python cross-language transforms, you must manually start any necessary expansion
services on your local machine and ensure they are accessible to your code during pipeline construction.&lt;/p>
&lt;p>&lt;strong>Using an SDK wrapper&lt;/strong>&lt;/p>
&lt;p>To use a cross-language transform through an SDK wrapper, import the package for the SDK wrapper
and call it from your pipeline as shown in the example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/kafkaio&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Kafka Read using previously defined values.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="nx">kafkaRecords&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">kafkaio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">Read&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">expansionAddr&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="c1">// Address of expansion service.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="nx">bootstrapAddr&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">topicName&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">kafkaio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">MaxNumRecords&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">numRecords&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">kafkaio&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">ConsumerConfigs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kd">map&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="kt">string&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s">&amp;#34;auto.offset.reset&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s">&amp;#34;earliest&amp;#34;&lt;/span>&lt;span class="p">}))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Using the CrossLanguage function&lt;/strong>&lt;/p>
&lt;p>When an SDK-specific wrapper isn&amp;rsquo;t available, you will have to access the cross-language transform through the &lt;code>beam.CrossLanguage&lt;/code> function.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Make sure you have the appropriate expansion service running. See the expansion service section for details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make sure the transform you&amp;rsquo;re trying to use is available and can be used by the expansion service.
Refer to &lt;a href="#create-x-lang-transforms">Creating cross-language transforms&lt;/a> for details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the &lt;code>beam.CrossLanguage&lt;/code> function in your pipeline as appropriate. Reference the URN, payload,
expansion service address, and define inputs and outputs. You can use the
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#CrossLanguagePayload">beam.CrossLanguagePayload&lt;/a>
function as a helper for encoding a payload. You can use the
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#UnnamedInput">beam.UnnamedInput&lt;/a> and
&lt;a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#UnnamedOutput">beam.UnnamedOutput&lt;/a>
functions as shortcuts for single, unnamed inputs/outputs or define a map for named ones.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">type&lt;/span> &lt;span class="nx">prefixPayload&lt;/span> &lt;span class="kd">struct&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">Data&lt;/span> &lt;span class="kt">string&lt;/span> &lt;span class="s">`beam:&amp;#34;data&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">urn&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="s">&amp;#34;beam:transforms:xlang:test:prefix&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">payload&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CrossLanguagePayload&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">prefixPayload&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nx">Data&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nx">prefix&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">expansionAddr&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="s">&amp;#34;localhost:8097&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">outT&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnnamedOutput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">typex&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">New&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">reflectx&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nx">String&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">res&lt;/span> &lt;span class="o">:=&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">CrossLanguage&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">s&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">urn&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">payload&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">expansionAddr&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">beam&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">UnnamedInput&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">inputPCol&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="nx">outT&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>After the job has been submitted to the Beam runner, shutdown the expansion service by
terminating the expansion service process.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h4 id="1324-using-cross-language-transforms-in-a-typescript-pipeline">13.2.4. Using cross-language transforms in a Typescript pipeline&lt;/h4>
&lt;p>Using a Typescript wrapper for a cross-language pipeline is similar to using any
other transform, provided the dependencies (e.g. a recent Python interpreter or
a Java JRE) is available. For example, most of the Typescript IOs are simply
wrappers around Beam transforms from other languages.&lt;/p>
&lt;p class="paragraph-wrap">If a wrapper is not already available, one can use it explicitly using
&lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/external.ts" target="_blank" rel="noopener noreferrer">apache_beam.transforms.external.rawExternalTransform&lt;/a>.
which takes a `urn` (a string identifying the transform),
a `payload` (a binary or json object parameterizing the transform),
and a `expansionService` which can either be an address of a pre-started service
or a callable returning an auto-started expansion service object.&lt;/p>
&lt;p>For example, one could write&lt;/p>
&lt;pre tabindex="0">&lt;code>pcoll.applyAsync(
rawExternalTransform(
&amp;#34;beam:registered:urn&amp;#34;,
{arg: value},
&amp;#34;localhost:expansion_service_port&amp;#34;
)
);
&lt;/code>&lt;/pre>&lt;p>Note that &lt;code>pcoll&lt;/code> must have a cross-language compatible coder coder such as &lt;code>SchemaCoder&lt;/code>.
This can be ensured with the &lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/internal.ts">withCoderInternal&lt;/a>
or &lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/internal.ts">withRowCoder&lt;/a>
transforms, e.g.&lt;/p>
&lt;pre tabindex="0">&lt;code>
const result = pcoll.apply(
beam.withRowCoder({ intFieldName: 0, stringFieldName: &amp;#34;&amp;#34; })
);
&lt;/code>&lt;/pre>&lt;p>Coder can also be specified on the output if it cannot be inferred, e.g.&lt;/p>
&lt;p>In addition, there are several utilities such as &lt;a href="https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/python.ts">pythonTransform&lt;/a>
that make it easier to invoke transforms from specific languages:&lt;/p>
&lt;pre tabindex="0">&lt;code>
const result: PCollection&amp;lt;number&amp;gt; = await pcoll
.apply(
beam.withName(&amp;#34;UpdateCoder1&amp;#34;, beam.withRowCoder({ a: 0, b: 0 }))
)
.applyAsync(
pythonTransform(
// Fully qualified name
&amp;#34;apache_beam.transforms.Map&amp;#34;,
// Positional arguments
[pythonCallable(&amp;#34;lambda x: x.a &amp;#43; x.b&amp;#34;)],
// Keyword arguments
{},
// Output type if it cannot be inferred
{ requestedOutputCoders: { output: new VarIntCoder() } }
)
);
&lt;/code>&lt;/pre>&lt;p>Cross-language transforms can also be defined in line, which can be useful
for accessing features or libraries not available in the calling SDK&lt;/p>
&lt;pre tabindex="0">&lt;code>
const result: PCollection&amp;lt;string&amp;gt; = await pcoll
.apply(withCoderInternal(new StrUtf8Coder()))
.applyAsync(
pythonTransform(
// Define an arbitrary transform from a callable.
&amp;#34;__callable__&amp;#34;,
[
pythonCallable(`
def apply(pcoll, prefix, postfix):
return pcoll | beam.Map(lambda s: prefix &amp;#43; s &amp;#43; postfix)
`),
],
// Keyword arguments to pass above, if any.
{ prefix: &amp;#34;x&amp;#34;, postfix: &amp;#34;y&amp;#34; },
// Output type if it cannot be inferred
{ requestedOutputCoders: { output: new StrUtf8Coder() } }
)
);
&lt;/code>&lt;/pre>&lt;h3 id="x-lang-transform-runner-support">13.3. Runner Support&lt;/h3>
&lt;p>Currently, portable runners such as Flink, Spark, and the direct runner can be used with multi-language pipelines.&lt;/p>
&lt;p>Dataflow supports multi-language pipelines through the Dataflow Runner v2 backend architecture.&lt;/p>
&lt;h3 id="x-lang-transform-tips-troubleshooting">13.4 Tips and Troubleshooting&lt;/h3>
&lt;p>For additional tips and troubleshooting information, see &lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Multi-language+Pipelines+Tips">here&lt;/a>.&lt;/p>
&lt;h2 id="batched-dofns">14 Batched DoFns&lt;/h2>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;li data-value="go">Go SDK&lt;/li>
&lt;li data-value="typescript">TypeScript SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">Batched DoFns enable users to create modular, composable components that
operate on batches of multiple logical elements. These DoFns can leverage
vectorized Python libraries, like numpy, scipy, and pandas, which operate on
batches of data for efficiency.&lt;/p>
&lt;h3 id="batched-dofn-basics">14.1 Basics&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">A trivial Batched DoFn might look like this:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MultiplyByTwo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="c1"># Type&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process_batch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">batch&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="c1"># Declare what the element-wise output type is&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">infer_output_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">This DoFn can be used in a Beam pipeline that otherwise operates on individual
elements. Beam will implicitly buffer elements and create numpy arrays on the
input side, and on the output side it will explode the numpy arrays back into
individual elements:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_output_types&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int64&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MultiplyByTwo&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="c1"># Implicit buffering and batch creation&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">/&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">))&lt;/span>  &lt;span class="c1"># Implicit batch explosion&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">Note that we use
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.ptransform.html#apache_beam.transforms.ptransform.PTransform.with_output_types">&lt;code>PTransform.with_output_types&lt;/code>&lt;/a> to
set the &lt;em>element-wise&lt;/em> typehint for the output of &lt;code>beam.Create&lt;/code>. Then, when
&lt;code>MultiplyByTwo&lt;/code> is applied to this &lt;code>PCollection&lt;/code>, Beam recognizes that
&lt;code>np.ndarray&lt;/code> is an acceptable batch type to use in conjunction with &lt;code>np.int64&lt;/code>
elements. We will use numpy typehints like these throughout this guide, but
Beam supports typehints from other libraries as well, see &lt;a href="#batched-dofn-types">Supported Batch
Types&lt;/a>.&lt;/p>
&lt;p class="language-py">In the previous case, Beam will implicitly create and explode batches at the
input and output boundaries. However, if Batched DoFns with equivalent types are
chained together, this batch creation and explosion will be elided. The batches
will be passed straight through! This makes it much simpler to efficiently
compose transforms that operate on batches.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="p">(&lt;/span>&lt;span class="n">p&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">4&lt;/span>&lt;span class="p">])&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">with_output_types&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int64&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MultiplyByTwo&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="c1"># Implicit buffering and batch creation&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MultiplyByTwo&lt;/span>&lt;span class="p">())&lt;/span> &lt;span class="c1"># Batches passed through&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">MultiplyByTwo&lt;/span>&lt;span class="p">()))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="batched-dofn-elementwise">14.2 Element-wise Fallback&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">For some DoFns you may be able to provide both a batched and an element-wise
implementation of your desired logic. You can do this by simply defining both
&lt;code>process&lt;/code> and &lt;code>process_batch&lt;/code>:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MultiplyByTwo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">element&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int64&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int64&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Multiply an individual int64 by 2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">element&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process_batch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Multiply a _batch_ of int64s by 2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">batch&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-py">When executing this DoFn, Beam will select the best implementation to use given
the context. Generally, if the inputs to a DoFn are already batched Beam will
use the batched implementation; otherwise it will use the element-wise
implementation defined in the &lt;code>process&lt;/code> method.&lt;/p>
&lt;p class="language-py">Note that, in this case, there is no need to define &lt;code>infer_output_type&lt;/code>. This is
because Beam can get the output type from the typehint on &lt;code>process&lt;/code>.&lt;/p>
&lt;h3 id="batched-dofn-batch-production">14.3 Batch Production vs. Batch Consumption&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">By convention, Beam assumes that the &lt;code>process_batch&lt;/code> method, which consumes
batched inputs, will also produce batched outputs. Similarly, Beam assumes the
&lt;code>process&lt;/code> method will produce individual elements. This can be overridden with
the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.yields_elements">&lt;code>@beam.DoFn.yields_elements&lt;/code>&lt;/a> and
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.yields_batches">&lt;code>@beam.DoFn.yields_batches&lt;/code>&lt;/a> decorators. For example:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Consumes elements, produces batches&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">ReadFromFile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="nd">@beam.DoFn.yields_batches&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">path&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">str&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">array&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="c1"># Declare what the element-wise output type is&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">infer_output_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">int64&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Consumes batches, produces elements&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">WriteToFile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="nd">@beam.DoFn.yields_elements&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process_batch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="nb">str&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">output_path&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="batched-dofn-types">14.4 Supported Batch Types&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">We’ve used numpy types in the Batched DoFn implementations in this guide –
&lt;code>np.int64 &lt;/code> as the element typehint and &lt;code>np.ndarray&lt;/code> as the corresponding
batch typehint – but Beam supports typehints from other libraries as well.&lt;/p>
&lt;h4 id="numpyhttpsgithubcomapachebeamblobmastersdkspythonapache_beamtypehintsbatchpy">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py">numpy&lt;/a>&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Element Typehint&lt;/th>
&lt;th>Batch Typehint&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Numeric types (&lt;code>int&lt;/code>, &lt;code>np.int32&lt;/code>, &lt;code>bool&lt;/code>, &amp;hellip;)&lt;/td>
&lt;td>np.ndarray (or NumpyArray)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="pandashttpsgithubcomapachebeamblobmastersdkspythonapache_beamtypehintspandas_type_compatibilitypy">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/pandas_type_compatibility.py">pandas&lt;/a>&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Element Typehint&lt;/th>
&lt;th>Batch Typehint&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Numeric types (&lt;code>int&lt;/code>, &lt;code>np.int32&lt;/code>, &lt;code>bool&lt;/code>, &amp;hellip;)&lt;/td>
&lt;td>&lt;code>pd.Series&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>bytes&lt;/code>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>Any&lt;/code>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="#schemas">Beam Schema Types&lt;/a>&lt;/td>
&lt;td>&lt;code>pd.DataFrame&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="pyarrowhttpsgithubcomapachebeamblobmastersdkspythonapache_beamtypehintsarrow_type_compatibilitypy">&lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/arrow_type_compatibility.py">pyarrow&lt;/a>&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Element Typehint&lt;/th>
&lt;th>Batch Typehint&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Numeric types (&lt;code>int&lt;/code>, &lt;code>np.int32&lt;/code>, &lt;code>bool&lt;/code>, &amp;hellip;)&lt;/td>
&lt;td>&lt;code>pd.Series&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>Any&lt;/code>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>List&lt;/code>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>Mapping&lt;/code>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="#schemas">Beam Schema Types&lt;/a>&lt;/td>
&lt;td>&lt;code>pa.Table&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="other-types">Other types?&lt;/h4>
&lt;p>If there are other batch types you would like to use with Batched DoFns, please
&lt;a href="https://github.com/apache/beam/issues/new/choose">file an issue&lt;/a>.&lt;/p>
&lt;h3 id="batched-dofn-dynamic-types">14.5 Dynamic Batch Input and Output Types&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">For some Batched DoFns, it may not be sufficient to declare batch types
statically, with typehints on &lt;code>process&lt;/code> and/or &lt;code>process_batch&lt;/code>. You may need to
declare these types dynamically. You can do this by overriding the
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.get_input_batch_type">&lt;code>get_input_batch_type&lt;/code>&lt;/a>
and
&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.get_output_batch_type">&lt;code>get_output_batch_type&lt;/code>&lt;/a>
methods on your DoFn:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Utilize Beam&amp;#39;s parameterized NumpyArray typehint&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.typehints.batch&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">NumpyArray&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">MultipyByTwo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># No typehints needed&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process_batch&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">batch&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">yield&lt;/span> &lt;span class="n">batch&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">get_input_batch_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">NumpyArray&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">input_element_type&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">get_output_batch_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">NumpyArray&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">input_element_type&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">infer_output_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">input_element_type&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="batched-dofn-event-time">14.6 Batches and Event-time Semantics&lt;/h3>
&lt;p class="language-go language-java language-typescript">Batched DoFns are currently a Python-only feature.&lt;/p>
&lt;p class="language-py">Currently, batches must have a single set of timing information (event time,
windows, etc&amp;hellip;) that applies to every logical element in the batch. There is
currently no mechanism to create batches that span multiple timestamps. However,
it is possible to retrieve this timing information in Batched DoFn
implementations. This information can be accessed by using the conventional
&lt;code>DoFn.*Param&lt;/code> attributes:&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">RetrieveTimingDoFn&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">process_batch&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="n">batch&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="n">timestamp&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">TimestampParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="n">pane_info&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">DoFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PaneInfoParam&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">   &lt;span class="p">)&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="n">Iterator&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ndarray&lt;/span>&lt;span class="p">]:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">     &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">  &lt;span class="k">def&lt;/span> &lt;span class="nf">infer_output_type&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">input_type&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">    &lt;span class="k">return&lt;/span> &lt;span class="n">input_type&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="transform-service">15 Transform service&lt;/h2>
&lt;p>The Apache Beam SDK versions 2.49.0 and later include a &lt;a href="https://docs.docker.com/compose/">Docker Compose&lt;/a>
service named &lt;em>Transform service&lt;/em>.&lt;/p>
&lt;p>The following diagram illustrates the basic architecture of the Transform service.&lt;/p>
&lt;p>&lt;img src="/images/transform_service.png" alt="Diagram of the Transform service architecture">&lt;/p>
&lt;p>To use the Transform service, Docker must be available on the machine that starts the service.&lt;/p>
&lt;p>The Transform service has several primary use cases.&lt;/p>
&lt;h3 id="transform-service-usage-upgrade">15.1 Using the transform service to upgrade transforms&lt;/h3>
&lt;p>Transform service can be used to upgrade (or downgrade) the Beam SDK versions of supported individual transforms used by Beam pipelines without changing the Beam version of the pipelines.
This feature is currently only available for Beam Java SDK 2.53.0 and later. Currently, the following transforms are available for upgrading:&lt;/p>
&lt;ul>
&lt;li>BigQuery read transform (URN: &lt;em>beam:transform:org.apache.beam:bigquery_read:v1&lt;/em>)&lt;/li>
&lt;li>BigQuery write transform (URN: &lt;em>beam:transform:org.apache.beam:bigquery_write:v1&lt;/em>)&lt;/li>
&lt;li>Kafka read transform (URN: &lt;em>beam:transform:org.apache.beam:kafka_read_with_metadata:v2&lt;/em>)&lt;/li>
&lt;li>Kafka write transform (URN: &lt;em>beam:transform:org.apache.beam:kafka_write:v2&lt;/em>)&lt;/li>
&lt;/ul>
&lt;p>To use this feature, you can simply execute a Java pipeline with additional pipeline options that specify the URNs of the transforms you would like to upgrade and the Beam version you would like to upgrade the transforms to. All transforms in the pipeline with matching URNs will be upgraded.&lt;/p>
&lt;p>For example, to upgrade the BigQuery read transform for a pipeline run using Beam &lt;code>2.53.0&lt;/code> to a future Beam version &lt;code>2.xy.z&lt;/code>, you can specify the following additional pipelines options.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="o">--&lt;/span>&lt;span class="n">transformsToOverride&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">beam&lt;/span>&lt;span class="o">:&lt;/span>&lt;span class="n">transform&lt;/span>&lt;span class="o">:&lt;/span>&lt;span class="n">org&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apache&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">beam&lt;/span>&lt;span class="o">:&lt;/span>&lt;span class="n">bigquery_read&lt;/span>&lt;span class="o">:&lt;/span>&lt;span class="n">v1&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">transformServiceBeamVersion&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">xy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">z&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">This&lt;/span> &lt;span class="n">feature&lt;/span> &lt;span class="ow">is&lt;/span> &lt;span class="n">currently&lt;/span> &lt;span class="ow">not&lt;/span> &lt;span class="n">available&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">Python&lt;/span> &lt;span class="n">SDK&lt;/span>&lt;span class="o">.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">This&lt;/span> &lt;span class="nx">feature&lt;/span> &lt;span class="nx">is&lt;/span> &lt;span class="nx">currently&lt;/span> &lt;span class="nx">not&lt;/span> &lt;span class="nx">available&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="nx">Go&lt;/span> &lt;span class="nx">SDK&lt;/span>&lt;span class="p">.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Note that the framework will automatically download the relevant Docker containers and startup the transform service for you.&lt;/p>
&lt;p>Please see &lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Transform+Service#TransformService-Upgradetransformswithoutupgradingthepipeline">here&lt;/a> for a full example that uses this feature to upgrade BigQuery read and write transforms.&lt;/p>
&lt;h3 id="transform-service-usage-multi-language">15.2 Using the Transform service for multi-language pipelines&lt;/h3>
&lt;p>Transform service implements the Beam expansion API. This allows Beam multi-language pipelines to use the transform service when expanding transforms available within the transform service.
The main advantage here is that multi-language pipelines will be able to operate without installing support for additional language runtimes. For example, Beam Python pipelines that use Java transforms such as
&lt;code>KafkaIO&lt;/code> can operate without installing Java locally during job submission as long as Docker is available in the system.&lt;/p>
&lt;p>In some cases, Apache Beam SDKs can automatically start the Transform service.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The Java &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java">&lt;code>PythonExternalTransform&lt;/code> API&lt;/a> automatically
starts the Transform service when a Python runtime isn&amp;rsquo;t available locally, but Docker is.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Apache Beam Python multi-language wrappers might automatically start the Transform service when you&amp;rsquo;re using Java transforms, a Java language runtime isn&amp;rsquo;t available locally, and Docker is available locally.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Beam users also have the option to &lt;a href="/documentation/programming-guide/#transform-service-usage-muanual">manually start&lt;/a> a transform service and use that as the expansion service used by multi-language pipelines.&lt;/p>
&lt;h3 id="transform-service-usage-muanual">15.3 Manually starting the transform service&lt;/h3>
&lt;p>A Beam Transform service instance can be manually started by using utilities provided with Apache Beam SDKs.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">java&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">jar&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">sdks&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">java&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">transform&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">service&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">app&lt;/span>&lt;span class="o">-&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">jar&lt;/span>&lt;span class="o">&amp;gt;.&lt;/span>&lt;span class="na">jar&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">port&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">port&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">beam_version&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">project_name&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">a&lt;/span> &lt;span class="n">unique&lt;/span> &lt;span class="n">ID&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">command&lt;/span> &lt;span class="n">up&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">python&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">m&lt;/span> &lt;span class="n">apache_beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transform_service_launcher&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">port&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">port&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">beam_version&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">project_name&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">a&lt;/span> &lt;span class="n">unique&lt;/span> &lt;span class="n">ID&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">command&lt;/span> &lt;span class="n">up&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">This&lt;/span> &lt;span class="nx">feature&lt;/span> &lt;span class="nx">is&lt;/span> &lt;span class="nx">currently&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">development&lt;/span>&lt;span class="p">.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>To stop the transform service, use the following commands.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">java&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">jar&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">sdks&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">java&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">transform&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">service&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="n">app&lt;/span>&lt;span class="o">-&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">jar&lt;/span>&lt;span class="o">&amp;gt;.&lt;/span>&lt;span class="na">jar&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">port&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">port&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">beam_version&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">project_name&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">a&lt;/span> &lt;span class="n">unique&lt;/span> &lt;span class="n">ID&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">command&lt;/span> &lt;span class="n">down&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="n">python&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="n">m&lt;/span> &lt;span class="n">apache_beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">utils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">transform_service_launcher&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">port&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">port&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">beam_version&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Beam&lt;/span> &lt;span class="n">version&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">project_name&lt;/span> &lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">a&lt;/span> &lt;span class="n">unique&lt;/span> &lt;span class="n">ID&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">the&lt;/span> &lt;span class="n">transform&lt;/span> &lt;span class="n">service&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">--&lt;/span>&lt;span class="n">command&lt;/span> &lt;span class="n">down&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-go snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">This&lt;/span> &lt;span class="nx">feature&lt;/span> &lt;span class="nx">is&lt;/span> &lt;span class="nx">currently&lt;/span> &lt;span class="nx">in&lt;/span> &lt;span class="nx">development&lt;/span>&lt;span class="p">.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="transform-service-included-transforms">15.4 Portable transforms included in the Transform service&lt;/h3>
&lt;p>Beam Transform service includes a number of transforms implemented in the Apache Beam Java and Python SDKs.&lt;/p>
&lt;p>Currently, the following transforms are included in the Transform service:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and the JDBC I/O connector&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Python transforms: all portable transforms implemented within the Apache Beam Python SDK, such as
&lt;a href="/documentation/transforms/python/elementwise/runinference/">RunInference&lt;/a> and
&lt;a href="/documentation/dsls/dataframes/overview/">DataFrame&lt;/a> transforms.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>For a more comprehensive list of available transforms, see the
&lt;a href="https://cwiki.apache.org/confluence/display/BEAM/Transform+Service#TransformService-TransformsincludedintheTransformservice">Transform service&lt;/a> developer guide.&lt;/p></description></item><item><title>Documentation: BigQuery ML integration</title><link>/documentation/patterns/bqml/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/bqml/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="bigquery-ml-integration">BigQuery ML integration&lt;/h1>
&lt;p>With the samples on this page we will demonstrate how to integrate models exported from &lt;a href="https://cloud.google.com/bigquery-ml/docs">BigQuery ML (BQML)&lt;/a> into your Apache Beam pipeline using &lt;a href="https://github.com/tensorflow/tfx-bsl">TFX Basic Shared Libraries (tfx_bsl)&lt;/a>.&lt;/p>
&lt;p>Roughly, the sections below will go through the following steps in more detail:&lt;/p>
&lt;ol>
&lt;li>Create and train your BigQuery ML model&lt;/li>
&lt;li>Export your BigQuery ML model&lt;/li>
&lt;li>Create a transform that uses the brand-new BigQuery ML model&lt;/li>
&lt;/ol>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="create-and-train-your-bigquery-ml-model">Create and train your BigQuery ML model&lt;/h2>
&lt;p>To be able to incorporate your BQML model into an Apache Beam pipeline using tfx_bsl, it has to be in the &lt;a href="https://www.tensorflow.org/guide/saved_model">TensorFlow SavedModel&lt;/a> format. An overview that maps different model types to their export model format for BQML can be found &lt;a href="https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples">here&lt;/a>.&lt;/p>
&lt;p>For the sake of simplicity, we&amp;rsquo;ll be training a (simplified version of the) logistic regression model in the &lt;a href="https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start">BQML quickstart guide&lt;/a>, using the publicly available Google Analytics sample dataset (which is a &lt;a href="https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard">date-sharded table&lt;/a> - alternatively, you might encounter &lt;a href="https://cloud.google.com/bigquery/docs/partitioned-tables">partitioned tables&lt;/a>). An overview of all models you can create using BQML can be found &lt;a href="https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in">here&lt;/a>.&lt;/p>
&lt;p>After creating a BigQuery dataset, you continue to create the model, which is fully defined in SQL:&lt;/p>
&lt;pre tabindex="0">&lt;code>CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
OPTIONS(model_type=&amp;#39;logistic_reg&amp;#39;, input_label_cols=[&amp;#34;label&amp;#34;]) AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(geoNetwork.country, &amp;#34;&amp;#34;) AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN &amp;#39;20160801&amp;#39; AND &amp;#39;20170630&amp;#39;
&lt;/code>&lt;/pre>&lt;p>The model will predict if a purchase will be made given the country of the visitor on data gathered between 2016-08-01 and 2017-06-30.&lt;/p>
&lt;h2 id="export-your-bigquery-ml-model">Export your BigQuery ML model&lt;/h2>
&lt;p>In order to incorporate your model in an Apache Beam pipeline, you will need to export it. Prerequisites to do so are &lt;a href="https://cloud.google.com/bigquery/docs/bq-command-line-tool">installing the &lt;code>bq&lt;/code> command-line tool&lt;/a> and &lt;a href="https://cloud.google.com/storage/docs/creating-buckets">creating a Google Cloud Storage bucket&lt;/a> to store your exported model.&lt;/p>
&lt;p>Export the model using the following command:&lt;/p>
&lt;pre tabindex="0">&lt;code>bq extract -m bqml_tutorial.sample_model gs://some/gcs/path
&lt;/code>&lt;/pre>&lt;h2 id="create-an-apache-beam-transform-that-uses-your-bigquery-ml-model">Create an Apache Beam transform that uses your BigQuery ML model&lt;/h2>
&lt;p>In this section we will construct an Apache Beam pipeline that will use the BigQuery ML model we just created and exported. The model can be served using Google Cloud AI Platform Prediction - for this please refer to the &lt;a href="/documentation/patterns/ai-platform/">AI Platform patterns&lt;/a>. In this case, we&amp;rsquo;ll be illustrating how to use the tfx_bsl library to do local predictions (on your Apache Beam workers).&lt;/p>
&lt;p>First, the model needs to be downloaded to a local directory where you will be developing the rest of your pipeline (e.g. to &lt;code>serving_dir/sample_model/1&lt;/code>).&lt;/p>
&lt;p>Then, you can start developing your pipeline like you would normally do. We will be using the &lt;code>RunInference&lt;/code> PTransform from the &lt;a href="https://github.com/tensorflow/tfx-bsl">tfx_bsl&lt;/a> library, and we will point it to our local directory where the model is stored (see the &lt;code>model_path&lt;/code> variable in the code example). The transform takes elements of the type &lt;code>tf.train.Example&lt;/code> as inputs and outputs elements of the type &lt;a href="https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_log.proto">&lt;code>tensorflow_serving.apis.prediction_log_pb2.PredictionLog&lt;/code>&lt;/a>. Depending on the signature of your model, you can extract values from the output; in our case we extract &lt;code>label_probs&lt;/code>, &lt;code>label_values&lt;/code> and the &lt;code>predicted_label&lt;/code> as per the &lt;a href="https://cloud.google.com/bigquery-ml/docs/exporting-models#logistic_reg">docs on the logistic regression model&lt;/a> in the &lt;code>extract_prediction&lt;/code> function.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">tensorflow&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">tf&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">google.protobuf&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">text_format&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tensorflow.python.framework&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">tensor_util&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tfx_bsl.beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">run_inference&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tfx_bsl.public.beam&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">RunInference&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tfx_bsl.public.proto&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">model_spec_pb2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">inputs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Example&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">features&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Features&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;os&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Feature&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bytes_list&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">tf&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">train&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">BytesList&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">b&lt;/span>&lt;span class="s2">&amp;#34;Microsoft&amp;#34;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">model_path&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;serving_dir/sample_model/1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">extract_prediction&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">yield&lt;/span> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">predict_log&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;label_values&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">string_val&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">tensor_util&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MakeNdarray&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">predict_log&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;label_probs&amp;#39;&lt;/span>&lt;span class="p">]),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">predict_log&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">response&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">outputs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;predicted_label&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">string_val&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">res&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">inputs&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">RunInference&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_spec_pb2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">InferenceSpecType&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">saved_model_spec&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">model_spec_pb2&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">SavedModelSpec&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_path&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">model_path&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">signature_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;serving_default&amp;#39;&lt;/span>&lt;span class="p">])))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">extract_prediction&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Implemented&lt;/span> &lt;span class="n">in&lt;/span> &lt;span class="n">Python&lt;/span>&lt;span class="o">.&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: BigQuery patterns</title><link>/documentation/patterns/bigqueryio/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/bigqueryio/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="google-bigquery-patterns">Google BigQuery patterns&lt;/h1>
&lt;p>The samples on this page show you common patterns for use with BigQueryIO.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="bigqueryio-deadletter-pattern">BigQueryIO deadletter pattern&lt;/h2>
&lt;p>In production systems, it is useful to implement the deadletter pattern with BigQueryIO outputting any elements which had errors during processing by BigQueryIO into another PCollection for further processing.
The samples below print the errors, but in a production system they can be sent to a deadletter table for later correction.&lt;/p>
&lt;p class="language-java">When using &lt;code>STREAMING_INSERTS&lt;/code> you can use the &lt;code>WriteResult&lt;/code> object to access a &lt;code>PCollection&lt;/code> with the &lt;code>TableRows&lt;/code> that failed to be inserted into BigQuery.
If you also set the &lt;code>withExtendedErrorInfo&lt;/code> property , you will be able to access a &lt;code>PCollection&amp;lt;BigQueryInsertError&amp;gt;&lt;/code> from the &lt;code>WriteResult&lt;/code>. The &lt;code>PCollection&lt;/code> will then include a reference to the table, the data row and the &lt;code>InsertErrors&lt;/code>. Which errors are added to the deadletter queue is determined via the &lt;code>InsertRetryPolicy&lt;/code>.&lt;/p>
&lt;p class="language-py">In the result tuple you can access &lt;code>FailedRows&lt;/code> to access the failed inserts.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">fromArgs&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">args&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withValidation&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">as&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">BigQueryOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Create a bug by writing the 2nd value as null. The API will correctly
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// throw an error when trying to insert a null value into a REQUIRED field.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">WriteResult&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">2&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">BigQueryIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withSchema&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">TableSchema&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setFields&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ImmutableList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">TableFieldSchema&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setName&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;num&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setType&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;INTEGER&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">setMode&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;REQUIRED&amp;#34;&lt;/span>&lt;span class="o">))))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Test.dummyTable&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withFormatFunction&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TableRow&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">set&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;num&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">x&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">2&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">?&lt;/span> &lt;span class="kc">null&lt;/span> &lt;span class="o">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withFailedInsertRetryPolicy&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">InsertRetryPolicy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">retryTransientErrors&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Forcing the bounded pipeline to use streaming inserts
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withMethod&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">BigQueryIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Write&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Method&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">STREAMING_INSERTS&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// set the withExtendedErrorInfo property.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withExtendedErrorInfo&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCreateDisposition&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">BigQueryIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Write&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">CreateDisposition&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">CREATE_IF_NEEDED&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withWriteDisposition&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">BigQueryIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Write&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">WriteDisposition&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">WRITE_APPEND&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">result&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">getFailedInsertsWithErr&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MapElements&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">into&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">TypeDescriptors&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">strings&lt;/span>&lt;span class="o">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">via&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">System&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">println&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34; The table was &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getTable&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">System&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">println&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34; The row was &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getRow&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">System&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">out&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">println&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34; The error was &amp;#34;&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getError&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="s">&amp;#34;&amp;#34;&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/* Sample Output From the pipeline:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> &amp;lt;p&amp;gt;The table was GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=Test,projectId=&amp;lt;&amp;gt;, tableId=dummyTable}}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> &amp;lt;p&amp;gt;The row was GenericData{classInfo=[f], {num=null}}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> &amp;lt;p&amp;gt;The error was GenericData{classInfo=[errors, index],{errors=[GenericData{classInfo=[debugInfo, location, message, reason], {debugInfo=,location=, message=Missing required field: Msg_0_CLOUD_QUERY_TABLE.num., reason=invalid}}],index=0}}
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cm"> */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Create pipeline.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schema&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">({&lt;/span>&lt;span class="s1">&amp;#39;fields&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[{&lt;/span>&lt;span class="s1">&amp;#39;name&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;a&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;type&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;STRING&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;mode&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s1">&amp;#39;REQUIRED&amp;#39;&lt;/span>&lt;span class="p">}]})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">errors&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Data&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;CreateBrokenData&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">src&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;a&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">src&lt;/span>&lt;span class="p">}&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">src&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">2&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;a&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="kc">None&lt;/span>&lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;WriteToBigQuery&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">io&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">WriteToBigQuery&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;lt;Your Project:Test.dummy_a_table&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">schema&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">schema&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">insert_retry_strategy&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;RETRY_ON_TRANSIENT_ERROR&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">create_disposition&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;CREATE_IF_NEEDED&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">write_disposition&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;WRITE_APPEND&amp;#39;&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">errors&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;FailedRows&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;PrintErrors&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">FlatMap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">err&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Error Found &lt;/span>&lt;span class="si">{}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">format&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">err&lt;/span>&lt;span class="p">))))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Cdap IO</title><link>/documentation/io/built-in/cdap/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/io/built-in/cdap/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="cdap-io">Cdap IO&lt;/h1>
&lt;p>A &lt;code>CdapIO&lt;/code> is a transform for reading data from source or writing data to sink CDAP plugin.&lt;/p>
&lt;h2 id="batch-plugins-support">Batch plugins support&lt;/h2>
&lt;p>&lt;code>CdapIO&lt;/code> currently supports the following CDAP Batch plugins by referencing &lt;code>CDAP plugin&lt;/code> class name:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/batch/HubspotBatchSource.java">Hubspot Batch Source&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/sink/batch/HubspotBatchSink.java">Hubspot Batch Sink&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceBatchSource.java">Salesforce Batch Source&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/data-integrations/salesforce/blob/develop/src/main/java/io/cdap/plugin/salesforce/plugin/sink/batch/SalesforceBatchSink.java">Salesforce Batch Sink&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/data-integrations/servicenow-plugins/blob/develop/src/main/java/io/cdap/plugin/servicenow/source/ServiceNowSource.java">ServiceNow Batch Source&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/data-integrations/zendesk/blob/develop/src/main/java/io/cdap/plugin/zendesk/source/batch/ZendeskBatchSource.java">Zendesk Batch Source&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Also, any other CDAP Batch plugin based on Hadoop&amp;rsquo;s &lt;code>InputFormat&lt;/code> or &lt;code>OutputFormat&lt;/code> can be used. They can be easily added to the list of supported by class name plugins, for more details please see &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/cdap/README.md">CdapIO readme&lt;/a>.&lt;/p>
&lt;h2 id="streaming-plugins-support">Streaming plugins support&lt;/h2>
&lt;p>&lt;code>CdapIO&lt;/code> currently supports CDAP Streaming plugins based on &lt;a href="https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html">Apache Spark Receiver&lt;/a>.&lt;/p>
&lt;p>Requirements for CDAP Streaming plugins:&lt;/p>
&lt;ul>
&lt;li>CDAP Streaming plugin should be based on &lt;code>Spark Receiver&lt;/code> (Spark 2.4).&lt;/li>
&lt;li>CDAP Streaming plugin should support work with offsets.&lt;/li>
&lt;li>Corresponding Spark Receiver should implement &lt;a href="https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/2/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java">HasOffset&lt;/a> interface.&lt;/li>
&lt;li>Records should have the numeric field that represents record offset.&lt;/li>
&lt;/ul>
&lt;h2 id="batch-reading-using-cdapio">Batch reading using CdapIO&lt;/h2>
&lt;p>In order to read from CDAP plugin you will need to pass:&lt;/p>
&lt;ul>
&lt;li>&lt;code>Key&lt;/code> and &lt;code>Value&lt;/code> classes. You will need to check if these classes have a Beam Coder available.&lt;/li>
&lt;li>&lt;code>PluginConfig&lt;/code> object with parameters for certain CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>You can easily build &lt;code>PluginConfig&lt;/code> object using &lt;code>ConfigWrapper&lt;/code> class by specifying:&lt;/p>
&lt;ul>
&lt;li>Class of the needed &lt;code>PluginConfig&lt;/code>.&lt;/li>
&lt;li>&lt;code>Map&amp;lt;String, Object&amp;gt;&lt;/code> parameters map for corresponding CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">Map&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">myPluginConfigParams&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">HashMap&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Read plugin parameters (e.g. from PipelineOptions) and put them into &amp;#39;myPluginConfigParams&amp;#39; map.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">myPluginConfigParams&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">put&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyPluginConstants&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">USERNAME_PARAMETER_NAME&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">pipelineOptions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getUsername&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// ...
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">MyPluginConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">MyPluginConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">myPluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="read-data-by-plugin-class-name">Read data by plugin class name&lt;/h3>
&lt;p>Some CDAP plugins are already supported and can be used just by plugin class name.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">JsonElement&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">JsonElement&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HubspotBatchSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JsonElement&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="read-data-with-building-batch-plugin">Read data with building Batch Plugin&lt;/h3>
&lt;p>If CDAP plugin is not supported by plugin class name, you can easily build &lt;code>Plugin&lt;/code> object by passing the following parameters:&lt;/p>
&lt;ul>
&lt;li>Class of CDAP Batch plugin.&lt;/li>
&lt;li>The &lt;code>InputFormat&lt;/code> class used to connect to your CDAP plugin of choice.&lt;/li>
&lt;li>The &lt;code>InputFormatProvider&lt;/code> class used to provide &lt;code>InputFormat&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>Then you will be able to pass this &lt;code>Plugin&lt;/code> object to &lt;code>CdapIO&lt;/code>.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPlugin&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Plugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createBatch&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyCdapPlugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyInputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyInputFormatProvider&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="examples-for-specific-cdap-plugins">Examples for specific CDAP plugins&lt;/h3>
&lt;h4 id="cdap-hubspot-batch-source-plugin">CDAP Hubspot Batch Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SourceHubspotConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">SourceHubspotConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">JsonElement&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">JsonElement&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HubspotBatchSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JsonElement&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromHubspotPlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="cdap-salesforce-batch-source-plugin">CDAP Salesforce Batch Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SalesforceSourceConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">SalesforceSourceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Schema&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LinkedHashMap&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">Schema&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">LinkedHashMap&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">SalesforceBatchSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Schema&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">LinkedHashMap&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromSalesforcePlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="cdap-servicenow-batch-source-plugin">CDAP ServiceNow Batch Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">ServiceNowSourceConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">ServiceNowSourceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ServiceNowSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromServiceNowPlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="cdap-zendesk-batch-source-plugin">CDAP Zendesk Batch Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">ZendeskBatchSourceConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">ZendeskBatchSourceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ZendeskBatchSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">StructuredRecord&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromZendeskPlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>To learn more please check out &lt;a href="https://github.com/apache/beam/tree/master/examples/java/cdap">complete examples&lt;/a>.&lt;/p>
&lt;h2 id="batch-writing-using-cdapio">Batch writing using CdapIO&lt;/h2>
&lt;p>In order to write to CDAP plugin you will need to pass:&lt;/p>
&lt;ul>
&lt;li>&lt;code>Key&lt;/code> and &lt;code>Value&lt;/code> classes. You will need to check if these classes have a Beam Coder available.&lt;/li>
&lt;li>&lt;code>locksDirPath&lt;/code>, which is locks directory path where locks will be stored. This parameter is needed for Hadoop External Synchronization (mechanism for acquiring locks related to the write job).&lt;/li>
&lt;li>&lt;code>PluginConfig&lt;/code> object with parameters for certain CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>You can easily build &lt;code>PluginConfig&lt;/code> object using &lt;code>ConfigWrapper&lt;/code> class by specifying:&lt;/p>
&lt;ul>
&lt;li>Class of the needed &lt;code>PluginConfig&lt;/code>.&lt;/li>
&lt;li>&lt;code>Map&amp;lt;String, Object&amp;gt;&lt;/code> parameters map for corresponding CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">MyPluginConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">MyPluginConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="write-data-by-plugin-class-name">Write data by plugin class name&lt;/h3>
&lt;p>Some CDAP plugins are already supported and can be used just by plugin class name.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Write&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">HubspotBatchSink&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLocksDirPath&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;write&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">writeTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="write-data-with-building-batch-plugin">Write data with building Batch Plugin&lt;/h3>
&lt;p>If CDAP plugin is not supported by plugin class name, you can easily build &lt;code>Plugin&lt;/code> object by passing the following parameters:&lt;/p>
&lt;ul>
&lt;li>Class of CDAP plugin.&lt;/li>
&lt;li>The &lt;code>OutputFormat&lt;/code> class used to connect to your CDAP plugin of choice.&lt;/li>
&lt;li>The &lt;code>OutputFormatProvider&lt;/code> class used to provide &lt;code>OutputFormat&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>Then you will be able to pass this &lt;code>Plugin&lt;/code> object to &lt;code>CdapIO&lt;/code>.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Write&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">writeTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPlugin&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Plugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createBatch&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyCdapPlugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyOutputFormat&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyOutputFormatProvider&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLocksDirPath&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;write&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">writeTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="examples-for-specific-cdap-plugins-1">Examples for specific CDAP plugins&lt;/h3>
&lt;h4 id="cdap-hubspot-batch-sink-plugin">CDAP Hubspot Batch Sink plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SinkHubspotConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">SinkHubspotConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">writeTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginClass&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLocksDirPath&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;writeToHubspotPlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">writeTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="cdap-salesforce-batch-sink-plugin">CDAP Salesforce Batch Sink plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SalesforceSinkConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">SalesforceSinkConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CSVRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">writeTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CSVRecord&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginClass&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CSVRecord&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withLocksDirPath&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">locksDirPath&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;writeToSalesforcePlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">writeTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>To learn more please check out &lt;a href="https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap">complete examples&lt;/a>.&lt;/p>
&lt;h2 id="streaming-reading-using-cdapio">Streaming reading using CdapIO&lt;/h2>
&lt;p>In order to read from CDAP plugin you will need to pass:&lt;/p>
&lt;ul>
&lt;li>&lt;code>Key&lt;/code> and &lt;code>Value&lt;/code> classes. You will need to check if these classes have a Beam Coder available.&lt;/li>
&lt;li>&lt;code>PluginConfig&lt;/code> object with parameters for certain CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>You can easily build &lt;code>PluginConfig&lt;/code> object using &lt;code>ConfigWrapper&lt;/code> class by specifying:&lt;/p>
&lt;ul>
&lt;li>Class of the needed &lt;code>PluginConfig&lt;/code>.&lt;/li>
&lt;li>&lt;code>Map&amp;lt;String, Object&amp;gt;&lt;/code> parameters map for corresponding CDAP plugin.&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">MyPluginConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">MyPluginConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="read-data-by-plugin-class-name-1">Read data by plugin class name&lt;/h3>
&lt;p>Some CDAP plugins are already supported and can be used just by plugin class name.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyStreamingPlugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="read-data-with-building-streaming-plugin">Read data with building Streaming Plugin&lt;/h3>
&lt;p>If CDAP plugin is not supported by plugin class name, you can easily build &lt;code>Plugin&lt;/code> object by passing the following parameters:&lt;/p>
&lt;ul>
&lt;li>Class of CDAP Streaming plugin.&lt;/li>
&lt;li>&lt;code>getOffsetFn&lt;/code>, which is &lt;code>SerializableFunction&lt;/code> that defines how to get &lt;code>Long&lt;/code> record offset from a record.&lt;/li>
&lt;li>&lt;code>receiverClass&lt;/code>, which is Spark (v 2.4) &lt;code>Receiver&lt;/code> class associated with CDAP plugin.&lt;/li>
&lt;li>(Optionally) &lt;code>getReceiverArgsFromConfigFn&lt;/code>, which is &lt;code>SerializableFunction&lt;/code> that defines how to get constructor arguments for Spark &lt;code>Receiver&lt;/code> using &lt;code>PluginConfig&lt;/code> object.&lt;/li>
&lt;/ul>
&lt;p>Then you will be able to pass this &lt;code>Plugin&lt;/code> object to &lt;code>CdapIO&lt;/code>.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPlugin&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Plugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createStreaming&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyStreamingPlugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">myGetOffsetFn&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">MyReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">myGetReceiverArgsFromConfigFn&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="read-data-with-optional-parameters">Read data with optional parameters&lt;/h3>
&lt;p>Optionally you can pass the following optional parameters:&lt;/p>
&lt;ul>
&lt;li>&lt;code>pullFrequencySec&lt;/code>, which is delay in seconds between polling for new records updates.&lt;/li>
&lt;li>&lt;code>startOffset&lt;/code>, which is inclusive start offset from which the reading should be started.&lt;/li>
&lt;/ul>
&lt;p>For example:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPluginClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">MyStreamingPlugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPullFrequencySec&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1L&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withStartOffset&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1L&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;read&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="examples-for-specific-cdap-plugins-2">Examples for specific CDAP plugins&lt;/h3>
&lt;h4 id="cdap-hubspot-streaming-source-plugin">CDAP Hubspot Streaming Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">HubspotStreamingSourceConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">HubspotStreamingSourceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPlugin&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Plugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createStreaming&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HubspotStreamingSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">GetOffsetUtils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOffsetFnForHubspot&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">HubspotReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromHubspotPlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h4 id="cdap-salesforce-streaming-source-plugin">CDAP Salesforce Streaming Source plugin&lt;/h4>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">SalesforceStreamingSourceConfig&lt;/span> &lt;span class="n">pluginConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">ConfigWrapper&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;(&lt;/span>&lt;span class="n">SalesforceStreamingSourceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withParams&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfigParams&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">build&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Read&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">readTransform&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CdapIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withCdapPlugin&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Plugin&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">createStreaming&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SalesforceStreamingSource&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">GetOffsetUtils&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOffsetFnForSalesforce&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SalesforceReceiver&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">config&lt;/span> &lt;span class="o">-&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SalesforceStreamingSourceConfig&lt;/span> &lt;span class="n">salesforceConfig&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">SalesforceStreamingSourceConfig&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="n">config&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">Object&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">salesforceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getAuthenticatorCredentials&lt;/span>&lt;span class="o">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">salesforceConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getPushTopicName&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withPluginConfig&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">pluginConfig&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withKeyClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">NullWritable&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">withValueClass&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">class&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;readFromSalesforcePlugin&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">readTransform&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>To learn more please check out &lt;a href="https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap">complete examples&lt;/a>.&lt;/p></description></item><item><title>Documentation: CoGroupByKey</title><link>/documentation/transforms/java/aggregation/cogroupbykey/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/cogroupbykey/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="cogroupbykey">CoGroupByKey&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/join/CoGroupByKey.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Aggregates all input elements by their key and allows downstream processing
to consume all values associated with the key. While &lt;code>GroupByKey&lt;/code> performs
this operation over a single input collection and thus a single type of
input values, &lt;code>CoGroupByKey&lt;/code> operates over multiple input collections. As
a result, the result for each key is a tuple of the values associated with
that key in each input collection.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#cogroupbykey">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example 1&lt;/strong>: Say you have two different files with user data; one file has
names and email addresses and the other file has names and phone numbers.&lt;/p>
&lt;p>You can join those two data sets, using the username as a common key and the
other data as the associated values. After the join, you have one data set
that contains all of the information (email addresses and phone numbers)
associated with each name.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">UID&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">pt1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="cm">/* ... */&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">UID&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">pt2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="cm">/* ... */&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">t1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">t2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">UID&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGBKResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KeyedPCollectionTuple&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">t1&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">pt1&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">t2&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">pt2&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CoGroupByKey&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">result&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;,&lt;/span> &lt;span class="cm">/* some result */&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">K&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">e&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">CoGbkResult&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getValue&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Retrieve all integers associated with this key from pt1
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Iterable&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">allIntegers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">result&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getAll&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">t1&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Retrieve the string associated with this key from pt2.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// Note: This will fail if multiple values had the same key in pt2.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">String&lt;/span> &lt;span class="n">string&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getOnly&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">t2&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">...&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Example 2:&lt;/strong>&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_GroupByKey"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_GroupByKey%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/groupbykey">GroupByKey&lt;/a>
takes one input collection.&lt;/li>
&lt;/ul></description></item><item><title>Documentation: CoGroupByKey</title><link>/documentation/transforms/python/aggregation/cogroupbykey/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/cogroupbykey/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="cogroupbykey">CoGroupByKey&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.CoGroupByKey"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Aggregates all input elements by their key and allows downstream processing
to consume all values associated with the key. While &lt;code>GroupByKey&lt;/code> performs
this operation over a single input collection and thus a single type of input
values, &lt;code>CoGroupByKey&lt;/code> operates over multiple input collections. As a result,
the result for each key is a tuple of the values associated with that key in
each input collection.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#cogroupbykey">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following example, we create a pipeline with two &lt;code>PCollection&lt;/code>s of produce, one with icons and one with durations, both with a common key of the produce name.
Then, we apply &lt;code>CoGroupByKey&lt;/code> to join both &lt;code>PCollection&lt;/code>s using their keys.&lt;/p>
&lt;p>&lt;code>CoGroupByKey&lt;/code> expects a dictionary of named keyed &lt;code>PCollection&lt;/code>s, and produces elements joined by their keys.
The values of each output element are dictionaries where the names correspond to the input dictionary, with lists of all the values found for that key.&lt;/p>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">icon_pairs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Create icons&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Apple&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;🍎&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Apple&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;🍏&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Eggplant&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;🍆&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Tomato&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;🍅&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">duration_pairs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Create durations&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">([&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Apple&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;perennial&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Carrot&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;biennial&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Tomato&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;perennial&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Tomato&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;annual&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plants&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(({&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s1">&amp;#39;icons&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">icon_pairs&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;durations&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">duration_pairs&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Merge&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">CoGroupByKey&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="notebook-skip">Output:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>(&amp;#39;Apple&amp;#39;, {&amp;#39;icons&amp;#39;: [&amp;#39;🍎&amp;#39;, &amp;#39;🍏&amp;#39;], &amp;#39;durations&amp;#39;: [&amp;#39;perennial&amp;#39;]})
(&amp;#39;Carrot&amp;#39;, {&amp;#39;icons&amp;#39;: [], &amp;#39;durations&amp;#39;: [&amp;#39;biennial&amp;#39;]})
(&amp;#39;Tomato&amp;#39;, {&amp;#39;icons&amp;#39;: [&amp;#39;🍅&amp;#39;], &amp;#39;durations&amp;#39;: [&amp;#39;perennial&amp;#39;, &amp;#39;annual&amp;#39;]})
(&amp;#39;Eggplant&amp;#39;, {&amp;#39;icons&amp;#39;: [&amp;#39;🍆&amp;#39;], &amp;#39;durations&amp;#39;: []})&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineglobally">CombineGlobally&lt;/a> to combine elements.&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/groupbykey">GroupByKey&lt;/a> takes one input collection.&lt;/li>
&lt;/ul>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.CoGroupByKey"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Combine</title><link>/documentation/transforms/java/aggregation/combine/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/combine/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combine">Combine&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Combine.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>A user-defined &lt;code>CombineFn&lt;/code> may be applied to combine all elements in a
&lt;code>PCollection&lt;/code> (global combine) or to combine all elements associated
with each key.&lt;/p>
&lt;p>While the result is similar to applying a &lt;code>GroupByKey&lt;/code> followed by
aggregating values in each &lt;code>Iterable&lt;/code>, there is an impact
on the code you must write as well as the performance of the pipeline.
Writing a &lt;code>ParDo&lt;/code> that counts the number of elements in each value
would be very straightforward. However, as described in the execution
model, it would also require all values associated with each key to be
processed by a single worker. This introduces a lot of communication overhead.
Using a &lt;code>CombineFn&lt;/code> requires the code be structured as an associative and
commumative operation. But, it allows the use of partial sums to be precomputed.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#combine">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example 1&lt;/strong>: Global combine&lt;/p>
&lt;p>Use the global combine to combine all of the elements in a given &lt;code>PCollection&lt;/code>
into a single value, represented in your pipeline as a new &lt;code>PCollection&lt;/code> containing
one element. The following example code shows how to apply the Beam-provided
sum combine function to produce a single sum value for a &lt;code>PCollection&lt;/code> of integers.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Sum.SumIntegerFn() combines the elements in the input PCollection. The resulting PCollection, called sum,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// contains one value: the sum of all the elements in the input PCollection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">pc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">sum&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pc&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Combine&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">globally&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SumIntegerFn&lt;/span>&lt;span class="o">()));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Example 2&lt;/strong>: Keyed combine&lt;/p>
&lt;p>Use a keyed combine to combine all of the values associated with each key
into a single output value for each key. As with the global combine, the
function passed to a keyed combine must be associative and commutative.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// PCollection is grouped by key and the Double values associated with each key are combined into a Double.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">salesRecords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">totalSalesPerPerson&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">salesRecords&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perKey&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">Sum&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">SumDoubleFn&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// The combined value is of a different type than the original collection of values per key. PCollection has
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// keys of type String and values of type Integer, and the combined value is a Double.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">playerAccuracy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">avgAccuracyPerPlayer&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">playerAccuracy&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Combine&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Integer&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Double&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">perKey&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">MeanInts&lt;/span>&lt;span class="o">())));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Example 3&lt;/strong>:&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_Combine"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_Combine%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/combinewithcontext">CombineWithContext&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/groupbykey">GroupByKey&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Documentation: CombineGlobally</title><link>/documentation/transforms/python/aggregation/combineglobally/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/combineglobally/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combineglobally">CombineGlobally&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineGlobally"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Combines all elements in a collection.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#combine">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following examples, we create a pipeline with a &lt;code>PCollection&lt;/code> of produce.
Then, we apply &lt;code>CombineGlobally&lt;/code> in multiple ways to combine all the elements in the &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;code>CombineGlobally&lt;/code> accepts a function that takes an &lt;code>iterable&lt;/code> of elements as an input, and combines them to return a single element.&lt;/p>
&lt;h3 id="example-1-combining-with-a-function">Example 1: Combining with a function&lt;/h3>
&lt;p>We define a function &lt;code>get_common_items&lt;/code> which takes an &lt;code>iterable&lt;/code> of sets as an input, and calculates the intersection (common items) of those sets.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineGloballyFunction"
data-show="combineglobally_function"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineGloballyFunction%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineglobally_function%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-2-combining-with-a-lambda-function">Example 2: Combining with a lambda function&lt;/h3>
&lt;p>We can also use lambda functions to simplify &lt;strong>Example 1&lt;/strong>.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineGloballyLambda"
data-show="combineglobally_lambda"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineGloballyLambda%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineglobally_lambda%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-3-combining-with-multiple-arguments">Example 3: Combining with multiple arguments&lt;/h3>
&lt;p>You can pass functions with multiple arguments to &lt;code>CombineGlobally&lt;/code>.
They are passed as additional positional arguments or keyword arguments to the function.&lt;/p>
&lt;p>In this example, the lambda function takes &lt;code>sets&lt;/code> and &lt;code>exclude&lt;/code> as arguments.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineGloballyMultipleArguments"
data-show="combineglobally_multiple_arguments"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineGloballyMultipleArguments%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineglobally_multiple_arguments%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-4-combining-with-a-combinefn">Example 4: Combining with a &lt;code>CombineFn&lt;/code>&lt;/h3>
&lt;p>The more general way to combine elements, and the most flexible, is with a class that inherits from &lt;code>CombineFn&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.create_accumulator">&lt;code>CombineFn.create_accumulator()&lt;/code>&lt;/a>:
This creates an empty accumulator.
For example, an empty accumulator for a sum would be &lt;code>0&lt;/code>, while an empty accumulator for a product (multiplication) would be &lt;code>1&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.add_input">&lt;code>CombineFn.add_input()&lt;/code>&lt;/a>:
Called &lt;em>once per element&lt;/em>.
Takes an accumulator and an input element, combines them and returns the updated accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.merge_accumulators">&lt;code>CombineFn.merge_accumulators()&lt;/code>&lt;/a>:
Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.extract_output">&lt;code>CombineFn.extract_output()&lt;/code>&lt;/a>:
It allows to do additional calculations before extracting a result.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineGloballyCombineFn"
data-show="combineglobally_combinefn"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineGloballyCombineFn%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineglobally_combinefn%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>You can use the following combiner transforms:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineperkey">CombinePerKey&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combinevalues">CombineValues&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/mean">Mean&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/count">Count&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/top">Top&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/sample">Sample&lt;/a>&lt;/li>
&lt;/ul>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineGlobally"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: CombinePerKey</title><link>/documentation/transforms/python/aggregation/combineperkey/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/combineperkey/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combineperkey">CombinePerKey&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombinePerKey"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Combines all elements for each key in a collection.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#combine">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following examples, we create a pipeline with a &lt;code>PCollection&lt;/code> of produce.
Then, we apply &lt;code>CombinePerKey&lt;/code> in multiple ways to combine all the elements in the &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;code>CombinePerKey&lt;/code> accepts a function that takes a list of values as an input, and combines them for each key.&lt;/p>
&lt;h3 id="example-1-combining-with-a-predefined-function">Example 1: Combining with a predefined function&lt;/h3>
&lt;p>We use the function
&lt;a href="https://docs.python.org/3/library/functions.html#sum">&lt;code>sum&lt;/code>&lt;/a>
which takes an &lt;code>iterable&lt;/code> of numbers and adds them together.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombinePerKeySimple"
data-show="combineperkey_simple"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombinePerKeySimple%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineperkey_simple%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-2-combining-with-a-function">Example 2: Combining with a function&lt;/h3>
&lt;p>We define a function &lt;code>saturated_sum&lt;/code> which takes an &lt;code>iterable&lt;/code> of numbers and adds them together, up to a predefined maximum number.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombinePerKeyFunction"
data-show="combineperkey_function"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombinePerKeyFunction%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineperkey_function%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-3-combining-with-a-lambda-function">Example 3: Combining with a lambda function&lt;/h3>
&lt;p>We can also use lambda functions to simplify &lt;strong>Example 2&lt;/strong>.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombinePerKeyLambda"
data-show="combineperkey_lambda"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombinePerKeyLambda%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineperkey_lambda%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-4-combining-with-multiple-arguments">Example 4: Combining with multiple arguments&lt;/h3>
&lt;p>You can pass functions with multiple arguments to &lt;code>CombinePerKey&lt;/code>.
They are passed as additional positional arguments or keyword arguments to the function.&lt;/p>
&lt;p>In this example, the lambda function takes &lt;code>values&lt;/code> and &lt;code>max_value&lt;/code> as arguments.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombinePerKeyMultipleArguments"
data-show="combineperkey_multiple_arguments"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombinePerKeyMultipleArguments%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineperkey_multiple_arguments%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-5-combining-with-a-combinefn">Example 5: Combining with a &lt;code>CombineFn&lt;/code>&lt;/h3>
&lt;p>The more general way to combine elements, and the most flexible, is with a class that inherits from &lt;code>CombineFn&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.create_accumulator">&lt;code>CombineFn.create_accumulator()&lt;/code>&lt;/a>:
This creates an empty accumulator.
For example, an empty accumulator for a sum would be &lt;code>0&lt;/code>, while an empty accumulator for a product (multiplication) would be &lt;code>1&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.add_input">&lt;code>CombineFn.add_input()&lt;/code>&lt;/a>:
Called &lt;em>once per element&lt;/em>.
Takes an accumulator and an input element, combines them and returns the updated accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.merge_accumulators">&lt;code>CombineFn.merge_accumulators()&lt;/code>&lt;/a>:
Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.extract_output">&lt;code>CombineFn.extract_output()&lt;/code>&lt;/a>:
It allows to do additional calculations before extracting a result.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombinePerKeyCombineFn"
data-show="combineperkey_combinefn"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombinePerKeyCombineFn%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combineperkey_combinefn%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>You can use the following combiner transforms:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineglobally">CombineGlobally&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combinevalues">CombineValues&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/mean">Mean&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/count">Count&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/top">Top&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/sample">Sample&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>See also &lt;a href="/documentation/transforms/python/aggregation/groupby">GroupBy&lt;/a> which allows you to combine more than one field at once.&lt;/p>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombinePerKey"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: CombineValues</title><link>/documentation/transforms/python/aggregation/combinevalues/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/combinevalues/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combinevalues">CombineValues&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineValues"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Combines an iterable of values in a keyed collection of elements.&lt;/p>
&lt;p>See more information in the &lt;a href="/documentation/programming-guide/#combine">Beam Programming Guide&lt;/a>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following examples, we create a pipeline with a &lt;code>PCollection&lt;/code> of produce.
Then, we apply &lt;code>CombineValues&lt;/code> in multiple ways to combine the keyed values in the &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;code>CombineValues&lt;/code> accepts a function that takes an &lt;code>iterable&lt;/code> of elements as an input, and combines them to return a single element.
&lt;code>CombineValues&lt;/code> expects a keyed &lt;code>PCollection&lt;/code> of elements, where the value is an iterable of elements to be combined.&lt;/p>
&lt;h3 id="example-1-combining-with-a-predefined-function">Example 1: Combining with a predefined function&lt;/h3>
&lt;p>We use the function
&lt;a href="https://docs.python.org/3/library/functions.html#sum">&lt;code>sum&lt;/code>&lt;/a>
which takes an &lt;code>iterable&lt;/code> of numbers and adds them together.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineValuesSimple"
data-show="combinevalues_simple"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineValuesSimple%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combinevalues_simple%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-2-combining-with-a-function">Example 2: Combining with a function&lt;/h3>
&lt;p>We want the sum to be bounded up to a maximum value, so we use
&lt;a href="https://en.wikipedia.org/wiki/Saturation_arithmetic">saturated arithmetic&lt;/a>.&lt;/p>
&lt;p>We define a function &lt;code>saturated_sum&lt;/code> which takes an &lt;code>iterable&lt;/code> of numbers and adds them together, up to a predefined maximum number.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineValuesFunction"
data-show="combinevalues_function"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineValuesFunction%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combinevalues_function%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-3-combining-with-a-lambda-function">Example 3: Combining with a lambda function&lt;/h3>
&lt;p>We can also use lambda functions to simplify &lt;strong>Example 2&lt;/strong>.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineValuesLambda"
data-show="combinevalues_lambda"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineValuesLambda%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combinevalues_lambda%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-4-combining-with-multiple-arguments">Example 4: Combining with multiple arguments&lt;/h3>
&lt;p>You can pass functions with multiple arguments to &lt;code>CombineValues&lt;/code>.
They are passed as additional positional arguments or keyword arguments to the function.&lt;/p>
&lt;p>In this example, the lambda function takes &lt;code>values&lt;/code> and &lt;code>max_value&lt;/code> as arguments.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineValuesMultipleArguments"
data-show="combinevalues_multiple_arguments"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineValuesMultipleArguments%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combinevalues_multiple_arguments%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-5-combining-with-a-combinefn">Example 5: Combining with a &lt;code>CombineFn&lt;/code>&lt;/h3>
&lt;p>The more general way to combine elements, and the most flexible, is with a class that inherits from &lt;code>CombineFn&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.create_accumulator">&lt;code>CombineFn.create_accumulator()&lt;/code>&lt;/a>:
This creates an empty accumulator.
For example, an empty accumulator for a sum would be &lt;code>0&lt;/code>, while an empty accumulator for a product (multiplication) would be &lt;code>1&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.add_input">&lt;code>CombineFn.add_input()&lt;/code>&lt;/a>:
Called &lt;em>once per element&lt;/em>.
Takes an accumulator and an input element, combines them and returns the updated accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.merge_accumulators">&lt;code>CombineFn.merge_accumulators()&lt;/code>&lt;/a>:
Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineFn.extract_output">&lt;code>CombineFn.extract_output()&lt;/code>&lt;/a>:
It allows to do additional calculations before extracting a result.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CombineValuesCombineFn"
data-show="combinevalues_combinefn"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CombineValuesCombineFn%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22combinevalues_combinefn%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>You can use the following combiner transforms:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineglobally">CombineGlobally&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/combineperkey">CombinePerKey&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/mean">Mean&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/count">Count&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/top">Top&lt;/a>&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/sample">Sample&lt;/a>&lt;/li>
&lt;/ul>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.CombineValues"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: CombineWithContext</title><link>/documentation/transforms/java/aggregation/combinewithcontext/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/combinewithcontext/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combinewithcontext">CombineWithContext&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/CombineWithContext.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>A class of transforms that contains combine functions that have access to &lt;code>PipelineOptions&lt;/code> and side inputs through &lt;code>CombineWithContext.Context&lt;/code>.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>See &lt;a href="https://issues.apache.org/jira/browse/BEAM-7703">BEAM-7703&lt;/a> for updates.&lt;/p>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/combine">Combine&lt;/a>
for combining all values associated with a key to a single result&lt;/li>
&lt;/ul></description></item><item><title>Documentation: CombineWithContext</title><link>/documentation/transforms/python/aggregation/combinewithcontext/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/combinewithcontext/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="combinewithcontext">CombineWithContext&lt;/h1>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>See &lt;a href="https://github.com/apache/beam/issues/19547">Issue 19547&lt;/a> for updates.&lt;/p>
&lt;h2 id="related-transforms">Related transforms&lt;/h2></description></item><item><title>Documentation: Container environments</title><link>/documentation/runtime/environments/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/runtime/environments/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="container-environments">Container environments&lt;/h1>
&lt;p>The Beam SDK runtime environment can be &lt;a href="https://www.docker.com/resources/what-container">containerized&lt;/a> with &lt;a href="https://www.docker.com/">Docker&lt;/a> to isolate it from other runtime systems. To learn more about the container environment, read the Beam &lt;a href="https://s.apache.org/beam-fn-api-container-contract">SDK Harness container contract&lt;/a>.&lt;/p>
&lt;p>Prebuilt SDK container images are released per supported language during Beam releases and pushed to &lt;a href="https://hub.docker.com/search?q=apache%2Fbeam&amp;amp;type=image">Docker Hub&lt;/a>.&lt;/p>
&lt;h2 id="custom-containers">Custom containers&lt;/h2>
&lt;p>You may want to customize container images for many reasons, including:&lt;/p>
&lt;ul>
&lt;li>Pre-installing additional dependencies&lt;/li>
&lt;li>Launching third-party software in the worker environment&lt;/li>
&lt;li>Further customizing the execution environment&lt;/li>
&lt;/ul>
&lt;p>This guide describes how to create and use customized containers for the Beam SDKs.&lt;/p>
&lt;h3 id="prerequisites">Prerequisites&lt;/h3>
&lt;ul>
&lt;li>This guide requires building images using Docker. &lt;a href="https://docs.docker.com/get-docker/">Install Docker locally&lt;/a>. Some CI/CD platforms like &lt;a href="https://cloud.google.com/cloud-build/docs/building/build-containers">Google Cloud Build&lt;/a> also provide the ability to build images using Docker.&lt;/li>
&lt;li>For remote execution engines/runners, have a container registry to host your custom container image. Options include &lt;a href="https://hub.docker.com/">Docker Hub&lt;/a> or a &amp;ldquo;self-hosted&amp;rdquo; repository, including cloud-specific container registries like &lt;a href="https://cloud.google.com/container-registry">Google Container Registry&lt;/a> (GCR) or &lt;a href="https://aws.amazon.com/ecr/">Amazon Elastic Container Registry&lt;/a> (ECR). Make sure your registry can be accessed by your execution engine or runner.&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: On Nov 20, 2020, Docker Hub put &lt;a href="https://www.docker.com/increase-rate-limits">rate limits&lt;/a> into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times.&lt;/p>
&lt;/blockquote>
&lt;p>For optimal user experience, we also recommend you use the latest released version of Beam.&lt;/p>
&lt;h3 id="building-and-pushing-custom-containers">Building and pushing custom containers&lt;/h3>
&lt;p>Beam &lt;a href="https://hub.docker.com/search?q=apache%2Fbeam&amp;amp;type=image">SDK container images&lt;/a> are built from Dockerfiles checked into the &lt;a href="https://github.com/apache/beam">Github&lt;/a> repository and published to Docker Hub for every release. You can build customized containers in one of three ways:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>&lt;a href="#writing-new-dockerfiles">Writing a new&lt;/a> Dockerfile based on a released container image&lt;/strong>. This is sufficient for simple additions to the image, such as adding artifacts or environment variables.&lt;/li>
&lt;li>&lt;strong>&lt;a href="#modifying-dockerfiles">Modifying&lt;/a> a source Dockerfile in &lt;a href="https://github.com/apache/beam">Beam&lt;/a>&lt;/strong>. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions).&lt;/li>
&lt;li>&lt;strong>&lt;a href="#modify-existing-base-image">Modifying&lt;/a> an existing container image to make it compatible with Apache Beam Runners&lt;/strong>. This method is used when users start from an existing image, and configure the image to be compatible with Apache Beam Runners.&lt;/li>
&lt;/ol>
&lt;h4 id="writing-new-dockerfiles">Writing a new Dockerfile based on an existing published container image&lt;/h4>
&lt;ol>
&lt;li>Create a new Dockerfile that designates a base image using the &lt;a href="https://docs.docker.com/engine/reference/builder/#from">FROM instruction&lt;/a>.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>FROM apache/beam_python3.7_sdk:2.25.0
ENV FOO=bar
COPY /src/path/to/file /dest/path/to/file/
&lt;/code>&lt;/pre>&lt;p>This &lt;code>Dockerfile&lt;/code> uses the prebuilt Python 3.7 SDK container image &lt;a href="https://hub.docker.com/r/apache/beam_python3.7_sdk">&lt;code>beam_python3.7_sdk&lt;/code>&lt;/a> tagged at (SDK version) &lt;code>2.25.0&lt;/code>, and adds an additional environment variable and file to the image.&lt;/p>
&lt;ol start="2">
&lt;li>&lt;a href="https://docs.docker.com/engine/reference/commandline/build/">Build&lt;/a> and &lt;a href="https://docs.docker.com/engine/reference/commandline/push/">push&lt;/a> the image using Docker.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>export BASE_IMAGE=&amp;#34;apache/beam_python3.7_sdk:2.25.0&amp;#34;
export IMAGE_NAME=&amp;#34;myremoterepo/mybeamsdk&amp;#34;
# Avoid using `latest` with custom containers to make reproducing failures easier.
export TAG=&amp;#34;mybeamsdk-versioned-tag&amp;#34;
# Optional - pull the base image into your local Docker daemon to ensure
# you have the most up-to-date version of the base image locally.
docker pull &amp;#34;${BASE_IMAGE}&amp;#34;
docker build -f Dockerfile -t &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34; .
&lt;/code>&lt;/pre>&lt;ol start="3">
&lt;li>If your runner is running remotely, retag and &lt;a href="https://docs.docker.com/engine/reference/commandline/push/">push&lt;/a> the image to the appropriate repository.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>docker push &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34;
&lt;/code>&lt;/pre>&lt;ol start="4">
&lt;li>After pushing a container image, verify the remote image ID and digest matches the local image ID and digest, output from &lt;code>docker build&lt;/code> or &lt;code>docker images&lt;/code>.&lt;/li>
&lt;/ol>
&lt;h4 id="modifying-dockerfiles">Modifying a source Dockerfile in Beam&lt;/h4>
&lt;p>This method requires building image artifacts from Beam source. For additional instructions on setting up your development environment, see the &lt;a href="/contribute/#development-setup">Contribution guide&lt;/a>.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: It is recommended that you start from a stable release branch (&lt;code>release-X.XX.X&lt;/code>) corresponding to the same version of the SDK to run your pipeline. Differences in SDK version may result in unexpected errors.&lt;/p>
&lt;/blockquote>
&lt;ol>
&lt;li>Clone the &lt;code>beam&lt;/code> repository.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>export BEAM_SDK_VERSION=&amp;#34;2.26.0&amp;#34;
git clone https://github.com/apache/beam.git
cd beam
# Save current directory as working directory
export BEAM_WORKDIR=$PWD
git checkout origin/release-$BEAM_SDK_VERSION
&lt;/code>&lt;/pre>&lt;ol start="2">
&lt;li>
&lt;p>Customize the &lt;code>Dockerfile&lt;/code> for a given language, typically &lt;code>sdks/&amp;lt;language&amp;gt;/container/Dockerfile&lt;/code> directory (e.g. the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile">Dockerfile for Python&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Return to the root Beam directory and run the Gradle &lt;code>docker&lt;/code> target for your image.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>cd $BEAM_WORKDIR
# The default repository of each SDK
./gradlew :sdks:java:container:java8:docker
./gradlew :sdks:java:container:java11:docker
./gradlew :sdks:java:container:java17:docker
./gradlew :sdks:go:container:docker
./gradlew :sdks:python:container:py38:docker
./gradlew :sdks:python:container:py39:docker
./gradlew :sdks:python:container:py310:docker
./gradlew :sdks:python:container:py311:docker
# Shortcut for building all Python SDKs
./gradlew :sdks:python:container:buildAll
&lt;/code>&lt;/pre>&lt;ol start="4">
&lt;li>Verify the images you built were created by running &lt;code>docker images&lt;/code>.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>$&amp;gt; docker images --digests
REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE
apache/beam_java8_sdk latest sha256:... ... 1 min ago ...
apache/beam_java11_sdk latest sha256:... ... 1 min ago ...
apache/beam_java17_sdk latest sha256:... ... 1 min ago ...
apache/beam_python3.6_sdk latest sha256:... ... 1 min ago ...
apache/beam_python3.7_sdk latest sha256:... ... 1 min ago ...
apache/beam_python3.8_sdk latest sha256:... ... 1 min ago ...
apache/beam_python3.9_sdk latest sha256:... ... 1 min ago ...
apache/beam_python3.10_sdk latest sha256:... ... 1 min ago ...
apache/beam_go_sdk latest sha256:... ... 1 min ago ...
&lt;/code>&lt;/pre>&lt;ol start="5">
&lt;li>If your runner is running remotely, retag the image and &lt;a href="https://docs.docker.com/engine/reference/commandline/push/">push&lt;/a> the image to your repository. You can skip this step if you provide a custom repo/tag as &lt;a href="#additional-build-parameters">additional parameters&lt;/a>.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>export BEAM_SDK_VERSION=&amp;#34;2.26.0&amp;#34;
export IMAGE_NAME=&amp;#34;gcr.io/my-gcp-project/beam_python3.7_sdk&amp;#34;
export TAG=&amp;#34;${BEAM_SDK_VERSION}-custom&amp;#34;
docker tag apache/beam_python3.7_sdk &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34;
docker push &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34;
&lt;/code>&lt;/pre>&lt;ol start="6">
&lt;li>After pushing a container image, verify the remote image ID and digest matches the local image ID and digest output from &lt;code>docker_images --digests&lt;/code>.&lt;/li>
&lt;/ol>
&lt;h4 id="additional-build-parameters">Additional build parameters&lt;/h4>
&lt;p>The docker Gradle task defines a default image repository and &lt;a href="https://docs.docker.com/engine/reference/commandline/tag/">tag&lt;/a> is the SDK version defined at &lt;a href="https://github.com/apache/beam/blob/master/gradle.properties">gradle.properties&lt;/a>. The default repository is the Docker Hub &lt;code>apache&lt;/code> namespace, and the default tag is the &lt;a href="https://github.com/apache/beam/blob/master/gradle.properties">SDK version&lt;/a> defined at gradle.properties.&lt;/p>
&lt;p>You can specify a different repository or tag for built images by providing parameters to the build task. For example:&lt;/p>
&lt;pre tabindex="0">&lt;code>./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=&amp;#34;example-repo&amp;#34; -Pdocker-tag=&amp;#34;2.26.0-custom&amp;#34;
&lt;/code>&lt;/pre>&lt;p>builds the Python 3.6 container and tags it as &lt;code>example-repo/beam_python3.6_sdk:2.26.0-custom&lt;/code>.&lt;/p>
&lt;p>From Beam 2.21.0 and later, a &lt;code>docker-pull-licenses&lt;/code> flag was introduced to add licenses/notices for third party dependencies to the docker images. For example:&lt;/p>
&lt;pre tabindex="0">&lt;code>./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses
&lt;/code>&lt;/pre>&lt;p>creates a Java 8 SDK image with appropriate licenses in &lt;code>/opt/apache/beam/third_party_licenses/&lt;/code>.&lt;/p>
&lt;p>By default, no licenses/notices are added to the docker images.&lt;/p>
&lt;h4 id="modify-existing-base-image">Modifying an existing container image to make it compatible with Apache Beam Runners&lt;/h4>
&lt;p>Beam offers a way to provide your own custom container image. The easiest way to build a new custom image that is compatible with Apache Beam Runners is to use a &lt;a href="https://docs.docker.com/develop/develop-images/multistage-build/">multi-stage build&lt;/a> process. This copies over the necessary artifacts from a default Apache Beam base image to build your custom container image.&lt;/p>
&lt;ol>
&lt;li>Copy necessary artifacts from Apache Beam base image to your image.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code># This can be any container image,
FROM python:3.8-bookworm
# Install SDK. (needed for Python SDK)
RUN pip install --no-cache-dir apache-beam[gcp]==2.52.0
# Copy files from official SDK image, including script/dependencies.
COPY --from=apache/beam_python3.8_sdk:2.52.0 /opt/apache/beam /opt/apache/beam
# Perform any additional customizations if desired
# Set the entrypoint to Apache Beam SDK launcher.
ENTRYPOINT [&amp;#34;/opt/apache/beam/boot&amp;#34;]
&lt;/code>&lt;/pre>&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: This example assumes necessary dependencies (in this case, Python 3.8 and pip) have been installed on the existing base image. Installing the Apache Beam SDK into the image will ensure that the image has the necessary SDK dependencies and reduce the worker startup time.
The version specified in the &lt;code>RUN&lt;/code> instruction must match the version used to launch the pipeline.&lt;br>
&lt;strong>Make sure that the Python or Java runtime version specified in the base image is the same as the version used to run the pipeline.&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: Any additional Python dependenices should be installed in the global Python environment in the custom image.&lt;/p>
&lt;/blockquote>
&lt;ol start="2">
&lt;li>&lt;a href="https://docs.docker.com/engine/reference/commandline/build/">Build&lt;/a> and &lt;a href="https://docs.docker.com/engine/reference/commandline/push/">push&lt;/a> the image using Docker.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code> export BASE_IMAGE=&amp;#34;apache/beam_python3.8_sdk:2.52.0&amp;#34;
export IMAGE_NAME=&amp;#34;myremoterepo/mybeamsdk&amp;#34;
export TAG=&amp;#34;latest&amp;#34;
# Optional - pull the base image into your local Docker daemon to ensure
# you have the most up-to-date version of the base image locally.
docker pull &amp;#34;${BASE_IMAGE}&amp;#34;
docker build -f Dockerfile -t &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34; .
&lt;/code>&lt;/pre>&lt;ol start="3">
&lt;li>If your runner is running remotely, retag the image and &lt;a href="https://docs.docker.com/engine/reference/commandline/push/">push&lt;/a> the image to your repository.&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>docker push &amp;#34;${IMAGE_NAME}:${TAG}&amp;#34;
&lt;/code>&lt;/pre>&lt;h4 id="from-scratch-go">Building a compatible container image from scratch (Go)&lt;/h4>
&lt;p>From the 2.55.0 release, the Beam Go SDK has moved to using &lt;a href="https://github.com/GoogleContainerTools/distroless">distroless images&lt;/a> as a base.
These images have a reduced security attack surface by not including common tools and utilities.
This may cause difficulties customizing the image with using one of the above approaches.
As a fallback, it&amp;rsquo;s possible to build a custom image from scratch, by building a matching boot loader, and setting
that as the container&amp;rsquo;s entry point.&lt;/p>
&lt;p>For example, if it&amp;rsquo;s preferable to use alpine as the container OS your multi-stage docker file might
look like the following:&lt;/p>
&lt;pre tabindex="0">&lt;code>FROM golang:latest-alpine AS build_base
# Set the Current Working Directory inside the container
WORKDIR /tmp/beam
# Build the Beam Go bootloader, to the local directory, matching your Beam version.
# Similar go targets exist for other SDK languages.
RUN GOBIN=`pwd` go install github.com/apache/beam/sdks/v2/go/container@v2.53.0
# Set the real base image.
FROM alpine:3.9
RUN apk add ca-certificates
# The following are required for the container to operate correctly.
# Copy the boot loader `container` to the image.
COPY --from=build_base /tmp/beam/container /opt/apache/beam/boot
# Set the container to use the newly built boot loader.
ENTRYPOINT [&amp;#34;/opt/apache/beam/boot&amp;#34;]
&lt;/code>&lt;/pre>&lt;p>Build and push the new image as when &lt;a href="#modify-existing-base-image">modifying an existing base image&lt;/a> above.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>NOTE&lt;/strong>: Java and Python require additional dependencies, such as their runtimes, and SDK packages for
a valid container image. The bootloader isn&amp;rsquo;t sufficient for creating a custom container for these SDKs.&lt;/p>
&lt;/blockquote>
&lt;h2 id="running-pipelines">Running pipelines with custom container images&lt;/h2>
&lt;p>The common method for providing a container image requires using the
PortableRunner flag &lt;code>--environment_config&lt;/code> as supported by the Portable
Runner or by runners supported PortableRunner flags.
Other runners, such as Dataflow, support specifying containers with different flags.&lt;/p>
&lt;div class='runner-direct snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-direct" data-lang="direct">export IMAGE=&amp;#34;my-repo/beam_python_sdk_custom&amp;#34;
export TAG=&amp;#34;X.Y.Z&amp;#34;
export IMAGE_URL=&amp;#34;${IMAGE}:${TAG}&amp;#34;
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output /path/to/write/counts \
--runner=PortableRunner \
--job_endpoint=embed \
--environment_type=&amp;#34;DOCKER&amp;#34; \
--environment_config=&amp;#34;${IMAGE_URL}&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-flink snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-flink" data-lang="flink">export IMAGE=&amp;#34;my-repo/beam_python_sdk_custom&amp;#34;
export TAG=&amp;#34;X.Y.Z&amp;#34;
export IMAGE_URL = &amp;#34;${IMAGE}:${TAG}&amp;#34;
# Run a pipeline using the FlinkRunner which starts a Flink job server.
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output=path/to/write/counts \
--runner=FlinkRunner \
# When running batch jobs locally, we need to reuse the container.
--environment_cache_millis=10000 \
--environment_type=&amp;#34;DOCKER&amp;#34; \
--environment_config=&amp;#34;${IMAGE_URL}&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-spark snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-spark" data-lang="spark">export IMAGE=&amp;#34;my-repo/beam_python_sdk_custom&amp;#34;
export TAG=&amp;#34;X.Y.Z&amp;#34;
export IMAGE_URL = &amp;#34;${IMAGE}:${TAG}&amp;#34;
# Run a pipeline using the SparkRunner which starts the Spark job server
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output=path/to/write/counts \
--runner=SparkRunner \
# When running batch jobs locally, we need to reuse the container.
--environment_cache_millis=10000 \
--environment_type=&amp;#34;DOCKER&amp;#34; \
--environment_config=&amp;#34;${IMAGE_URL}&amp;#34;&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;div class='runner-dataflow snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code class="language-dataflow" data-lang="dataflow">export GCS_PATH=&amp;#34;gs://my-gcs-bucket&amp;#34;
export GCP_PROJECT=&amp;#34;my-gcp-project&amp;#34;
export REGION=&amp;#34;us-central1&amp;#34;
# By default, the Dataflow runner has access to the GCR images
# under the same project.
export IMAGE=&amp;#34;my-repo/beam_python_sdk_custom&amp;#34;
export TAG=&amp;#34;X.Y.Z&amp;#34;
export IMAGE_URL = &amp;#34;${IMAGE}:${TAG}&amp;#34;
# Run a pipeline on Dataflow.
# This is a Python batch pipeline, so to run on Dataflow Runner V2
# you must specify the experiment &amp;#34;use_runner_v2&amp;#34;
python -m apache_beam.examples.wordcount \
--input gs://dataflow-samples/shakespeare/kinglear.txt \
--output &amp;#34;${GCS_PATH}/counts&amp;#34; \
--runner DataflowRunner \
--project $GCP_PROJECT \
--region $REGION \
--temp_location &amp;#34;${GCS_PATH}/tmp/&amp;#34; \
--experiment=use_runner_v2 \
--sdk_container_image=$IMAGE_URL&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>Avoid using the tag &lt;code>:latest&lt;/code> with your custom images. Tag your builds with a date
or a unique identifier. If something goes wrong, using this type of tag might make
it possible to revert the pipeline execution to a previously known working
configuration and allow for an inspection of changes.&lt;/p>
&lt;h3 id="troubleshooting">Troubleshooting&lt;/h3>
&lt;p>The following section describes some common issues to consider
when you encounter unexpected errors running Beam pipelines with
custom containers.&lt;/p>
&lt;ul>
&lt;li>Differences in language and SDK version between the container SDK and
pipeline SDK may result in unexpected errors due to incompatibility. For best
results, make sure to use the same stable SDK version for your base container
and when running your pipeline.&lt;/li>
&lt;li>If you are running into unexpected errors when using remote containers,
make sure that your container exists in the remote repository and can be
accessed by any third-party service, if needed.&lt;/li>
&lt;li>Local runners attempt to pull remote images and default to local
images. If an image cannot be pulled locally (by the docker daemon),
you may see an log message like:
&lt;pre tabindex="0">&lt;code>Error response from daemon: manifest for remote.repo/beam_python3.7_sdk:2.25.0-custom not found: manifest unknown: ...
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Unable to pull image...
&lt;/code>&lt;/pre>&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Count</title><link>/documentation/transforms/java/aggregation/count/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/count/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="count">Count&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Count.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Counts the number of elements within each aggregation. The &lt;code>Count&lt;/code>
transform has three varieties:&lt;/p>
&lt;ul>
&lt;li>&lt;code>Count.globally()&lt;/code> counts the number of elements in the entire
&lt;code>PCollection&lt;/code>. The result is a collection with a single element.&lt;/li>
&lt;li>&lt;code>Count.perKey()&lt;/code> counts how many elements are associated with each
key. It ignores the values. The resulting collection has one
output for every key in the input collection.&lt;/li>
&lt;li>&lt;code>Count.perElement()&lt;/code> counts how many times each element appears
in the input collection. The output collection is a key-value
pair, containing each unique element and the number of times it
appeared in the original collection.&lt;/li>
&lt;/ul>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example 1&lt;/strong>: Count.globally&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_Count"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_Count%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Example 2&lt;/strong>: Count.perKey&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_CountPerKey"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_CountPerKey%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/approximateunique">ApproximateUnique&lt;/a>
estimates the number of distinct elements or distinct values in key-value pairs&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/sum">Sum&lt;/a> computes
the sum of elements in a collection&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Count</title><link>/documentation/transforms/python/aggregation/count/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/count/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="count">Count&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.combiners.html#apache_beam.transforms.combiners.Count"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Counts the number of elements within each aggregation.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following example, we create a pipeline with two &lt;code>PCollection&lt;/code>s of produce.
Then, we apply &lt;code>Count&lt;/code> to get the total number of elements in different ways.&lt;/p>
&lt;h3 id="example-1-counting-all-elements-in-a-pcollection">Example 1: Counting all elements in a PCollection&lt;/h3>
&lt;p>We use &lt;code>Count.Globally()&lt;/code> to count &lt;em>all&lt;/em> elements in a &lt;code>PCollection&lt;/code>, even if there are duplicate elements.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CountGlobally"
data-show="count_globally"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CountGlobally%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22count_globally%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-2-counting-elements-for-each-key">Example 2: Counting elements for each key&lt;/h3>
&lt;p>We use &lt;code>Count.PerKey()&lt;/code> to count the elements for each unique key in a &lt;code>PCollection&lt;/code> of key-values.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CountPerKey"
data-show="count_per_key"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CountPerKey%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22count_per_key%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h3 id="example-3-counting-all-unique-elements">Example 3: Counting all unique elements&lt;/h3>
&lt;p>We use &lt;code>Count.PerElement()&lt;/code> to count the only the unique elements in a &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_CountPerElement"
data-show="count_per_element"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_CountPerElement%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22count_per_element%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>N/A&lt;/p>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.combiners.html#apache_beam.transforms.combiners.Count"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Create</title><link>/documentation/transforms/java/other/create/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/other/create/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="create">Create&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Create.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Creates a collection containing a specified set of elements. This is useful
for testing, as well as creating an initial input to process in parallel.
For example, a single element to execute a one-time &lt;code>ParDo&lt;/code> or a list of filenames to be read.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_Create"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_Create%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>N/A&lt;/p></description></item><item><title>Documentation: Create</title><link>/documentation/transforms/python/other/create/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/other/create/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="create">Create&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Create">
&lt;img src="/images/logos/sdks/python.png" width="20px" height="20px"
alt="Pydoc" />
Pydoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Creates a collection containing a specified set of elements. This is
useful for testing, as well as creating an initial input to process
in parallel. For example, a single element to execute a one-time
&lt;code>ParDo&lt;/code> or a list of filenames to be read.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_Create"
data-show="create"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_Create%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22create%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2></description></item><item><title>Documentation: Create Your Pipeline</title><link>/documentation/pipelines/create-your-pipeline/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/pipelines/create-your-pipeline/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="create-your-pipeline">Create Your Pipeline&lt;/h1>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#creating-your-pipeline-object">Creating Your Pipeline Object&lt;/a>&lt;/li>
&lt;li>&lt;a href="#reading-data-into-your-pipeline">Reading Data Into Your Pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#applying-transforms-to-process-pipeline-data">Applying Transforms to Process Pipeline Data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#writing-or-outputting-your-final-pipeline-data">Writing or Outputting Your Final Pipeline Data&lt;/a>&lt;/li>
&lt;li>&lt;a href="#running-your-pipeline">Running Your Pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#whats-next">What&amp;rsquo;s next&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>Your Beam program expresses a data processing pipeline, from start to finish. This section explains the mechanics of using the classes in the Beam SDKs to build a pipeline. To construct a pipeline using the classes in the Beam SDKs, your program will need to perform the following general steps:&lt;/p>
&lt;ul>
&lt;li>Create a &lt;code>Pipeline&lt;/code> object.&lt;/li>
&lt;li>Use a &lt;strong>Read&lt;/strong> or &lt;strong>Create&lt;/strong> transform to create one or more &lt;code>PCollection&lt;/code>s for your pipeline data.&lt;/li>
&lt;li>Apply &lt;strong>transforms&lt;/strong> to each &lt;code>PCollection&lt;/code>. Transforms can change, filter, group, analyze, or otherwise process the elements in a &lt;code>PCollection&lt;/code>. Each transform creates a new output &lt;code>PCollection&lt;/code>, to which you can apply additional transforms until processing is complete.&lt;/li>
&lt;li>&lt;strong>Write&lt;/strong> or otherwise output the final, transformed &lt;code>PCollection&lt;/code>s.&lt;/li>
&lt;li>&lt;strong>Run&lt;/strong> the pipeline.&lt;/li>
&lt;/ul>
&lt;h2 id="creating-your-pipeline-object">Creating Your Pipeline Object&lt;/h2>
&lt;p>A Beam program often starts by creating a &lt;code>Pipeline&lt;/code> object.&lt;/p>
&lt;p>In the Beam SDKs, each pipeline is represented by an explicit object of type &lt;code>Pipeline&lt;/code>. Each &lt;code>Pipeline&lt;/code> object is an independent entity that encapsulates both the data the pipeline operates over and the transforms that get applied to that data.&lt;/p>
&lt;p>To create a pipeline, declare a &lt;code>Pipeline&lt;/code> object, and pass it some &lt;a href="/documentation/programming-guide#configuring-pipeline-options">configuration options&lt;/a>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Start by defining the options for the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PipelineOptions&lt;/span> &lt;span class="n">options&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PipelineOptionsFactory&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Then create the pipeline.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">Pipeline&lt;/span> &lt;span class="n">p&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">options&lt;/span>&lt;span class="o">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="reading-data-into-your-pipeline">Reading Data Into Your Pipeline&lt;/h2>
&lt;p>To create your pipeline&amp;rsquo;s initial &lt;code>PCollection&lt;/code>, you apply a root transform to your pipeline object. A root transform creates a &lt;code>PCollection&lt;/code> from either an external data source or some local data you specify.&lt;/p>
&lt;p>There are two kinds of root transforms in the Beam SDKs: &lt;code>Read&lt;/code> and &lt;code>Create&lt;/code>. &lt;code>Read&lt;/code> transforms read data from an external source, such as a text file or a database table. &lt;code>Create&lt;/code> transforms create a &lt;code>PCollection&lt;/code> from an in-memory &lt;code>java.util.Collection&lt;/code>.&lt;/p>
&lt;p>The following example code shows how to &lt;code>apply&lt;/code> a &lt;code>TextIO.Read&lt;/code> root transform to read data from a text file. The transform is applied to a &lt;code>Pipeline&lt;/code> object &lt;code>p&lt;/code>, and returns a pipeline data set in the form of a &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">lines&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;ReadLines&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://some/inputData.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="applying-transforms-to-process-pipeline-data">Applying Transforms to Process Pipeline Data&lt;/h2>
&lt;p>You can manipulate your data using the various &lt;a href="/documentation/programming-guide/#transforms">transforms&lt;/a> provided in the Beam SDKs. To do this, you &lt;strong>apply&lt;/strong> the transforms to your pipeline&amp;rsquo;s &lt;code>PCollection&lt;/code> by calling the &lt;code>apply&lt;/code> method on each &lt;code>PCollection&lt;/code> that you want to process and passing the desired transform object as an argument.&lt;/p>
&lt;p>The following code shows how to &lt;code>apply&lt;/code> a transform to a &lt;code>PCollection&lt;/code> of strings. The transform is a user-defined custom transform that reverses the contents of each string and outputs a new &lt;code>PCollection&lt;/code> containing the reversed strings.&lt;/p>
&lt;p>The input is a &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code> called &lt;code>words&lt;/code>; the code passes an instance of a &lt;code>PTransform&lt;/code> object called &lt;code>ReverseWords&lt;/code> to &lt;code>apply&lt;/code>, and saves the return value as the &lt;code>PCollection&amp;lt;String&amp;gt;&lt;/code> called &lt;code>reversedWords&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">words&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">reversedWords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">words&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">ReverseWords&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="writing-or-outputting-your-final-pipeline-data">Writing or Outputting Your Final Pipeline Data&lt;/h2>
&lt;p>Once your pipeline has applied all of its transforms, you&amp;rsquo;ll usually need to output the results. To output your pipeline&amp;rsquo;s final &lt;code>PCollection&lt;/code>s, you apply a &lt;code>Write&lt;/code> transform to that &lt;code>PCollection&lt;/code>. &lt;code>Write&lt;/code> transforms can output the elements of a &lt;code>PCollection&lt;/code> to an external data sink, such as a database table. You can use &lt;code>Write&lt;/code> to output a &lt;code>PCollection&lt;/code> at any time in your pipeline, although you&amp;rsquo;ll typically write out data at the end of your pipeline.&lt;/p>
&lt;p>The following example code shows how to &lt;code>apply&lt;/code> a &lt;code>TextIO.Write&lt;/code> transform to write a &lt;code>PCollection&lt;/code> of &lt;code>String&lt;/code> to a text file:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">filteredWords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">filteredWords&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;WriteMyFile&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">write&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">to&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gs://some/outputData.txt&amp;#34;&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="running-your-pipeline">Running Your Pipeline&lt;/h2>
&lt;p>Once you have constructed your pipeline, use the &lt;code>run&lt;/code> method to execute the pipeline. Pipelines are executed asynchronously: the program you create sends a specification for your pipeline to a &lt;strong>pipeline runner&lt;/strong>, which then constructs and runs the actual series of pipeline operations.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>The &lt;code>run&lt;/code> method is asynchronous. If you&amp;rsquo;d like a blocking execution instead, run your pipeline appending the &lt;code>waitUntilFinish&lt;/code> method:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">run&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">waitUntilFinish&lt;/span>&lt;span class="o">();&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="whats-next">What&amp;rsquo;s next&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/programming-guide">Programming Guide&lt;/a> - Learn the details of creating your pipeline, configuring pipeline options, and applying transforms.&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/test-your-pipeline">Test your pipeline&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Cross Language RunInference</title><link>/documentation/ml/multi-language-inference/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/ml/multi-language-inference/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="using-runinference-from-java-sdk">Using RunInference from Java SDK&lt;/h1>
&lt;p>The pipeline in this example is written in Java and reads the input data from Google Cloud Storage. With the help of a &lt;a href="https://beam.apache.org/documentation/programming-guide/#1312-creating-cross-language-python-transforms">PythonExternalTransform&lt;/a>,
a composite Python transform is called to do the preprocessing, postprocessing, and inference.
Lastly, the data is written back to Google Cloud Storage in the Java pipeline.&lt;/p>
&lt;p>You can find the code used in this example in the &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/multi_language_inference">Beam repository&lt;/a>.&lt;/p>
&lt;h2 id="nlp-model-and-dataset">NLP model and dataset&lt;/h2>
&lt;p>A &lt;code>bert-base-uncased&lt;/code> natural language processing (NLP) model is used to make inference. This model is open source and available on &lt;a href="https://huggingface.co/bert-base-uncased">HuggingFace&lt;/a>. This BERT-model is
used to predict the last word of a sentence based on the context of the sentence.&lt;/p>
&lt;p>We also use an &lt;a href="https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?select=IMDB+Dataset.csv">IMDB movie reviews&lt;/a> dataset, which is an open-source dataset that is available on Kaggle.&lt;/p>
&lt;p>The following is a sample of the data after preprocessing:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;strong>Text&lt;/strong>&lt;/th>
&lt;th style="text-align:left">&lt;strong>Last Word&lt;/strong>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;img width=700/>&lt;/td>
&lt;td style="text-align:left">&lt;img width=100/>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>One of the other reviewers has mentioned that after watching just 1 Oz episode you&amp;rsquo;ll be [MASK]&lt;/td>
&lt;td style="text-align:left">hooked&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>A wonderful little [MASK]&lt;/td>
&lt;td style="text-align:left">production&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>So im not a big fan of Boll&amp;rsquo;s work but then again not many [MASK]&lt;/td>
&lt;td style="text-align:left">are&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>This a fantastic movie of three prisoners who become [MASK]&lt;/td>
&lt;td style="text-align:left">famous&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Some films just simply should not be [MASK]&lt;/td>
&lt;td style="text-align:left">remade&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>The Karen Carpenter Story shows a little more about singer Karen Carpenter&amp;rsquo;s complex [MASK]&lt;/td>
&lt;td style="text-align:left">life&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="multi-language-inference-pipeline">Multi-language Inference pipeline&lt;/h2>
&lt;p>When using multi-language pipelines, you have access to a much larger pool of transforms. For more information, see &lt;a href="https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines">Multi-language pipelines&lt;/a> in the Apache Beam Programming Guide.&lt;/p>
&lt;h3 id="custom-python-transform">Custom Python transform&lt;/h3>
&lt;p>In addition to running inference, we also need to perform preprocessing and postprocessing on the data. Postprocessing the data makes it possible to interpret the output. In order to do these three tasks, one single composite custom PTransform is written, with a unit DoFn or PTransform for each of the tasks, as shown in the following snippet:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">expand&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">pcoll&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pcoll&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Preprocess&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Preprocess&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_tokenizer&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Inference&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">RunInference&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">KeyedModelHandler&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model_handler&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s1">&amp;#39;Postprocess&amp;#39;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ParDo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Postprocess&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_tokenizer&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>First, the preprocessing of the data. In this case, the raw textual data is cleaned and tokenized for the BERT-model. All these steps are run in the &lt;code>Preprocess&lt;/code> DoFn. The &lt;code>Preprocess&lt;/code> DoFn takes a single element as input and returns a list with both the original text and the tokenized text.&lt;/p>
&lt;p>The preprocessed data is then used to make inference. This is done in the &lt;a href="https://beam.apache.org/documentation/ml/overview/#runinference">&lt;code>RunInference&lt;/code>&lt;/a> PTransform, which is already available in the Apache Beam SDK. The &lt;code>RunInference&lt;/code> PTransform requires one parameter, a model handler. In this example the &lt;code>KeyedModelHandler&lt;/code> is used, because the &lt;code>Preprocess&lt;/code> DoFn also outputs the original sentence. You can change how preprocessing is done based on your requirements. This model handler is defined in the following initialization function of the composite PTransform:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="fm">__init__&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">model_path&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">logging&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">info&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Downloading &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> model from GCS.&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model_config&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BertConfig&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_pretrained&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_tokenizer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BertTokenizer&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">from_pretrained&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">PytorchModelHandlerKeyedTensorWrapper&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">state_dict_path&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">model_path&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_class&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">BertForMaskedLM&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">model_params&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;config&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="bp">self&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">_model_config&lt;/span>&lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">device&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;cuda:0&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>PytorchModelHandlerKeyedTensorWrapper&lt;/code>, a wrapper around the &lt;code>PytorchModelHandlerKeyedTensor&lt;/code> model handler, is used. The &lt;code>PytorchModelHandlerKeyedTensor&lt;/code> model handler makes inference on a PyTorch model. Because the tokenized strings generated from &lt;code>BertTokenizer&lt;/code> might have different lengths and stack() requires tensors to be the same size, the &lt;code>PytorchModelHandlerKeyedTensorWrapper&lt;/code> limits the batch size to 1. Restricting &lt;code>max_batch_size&lt;/code> to 1 means the run_inference() call contains one example per batch. The following code shows the definition of the wrapper:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">class&lt;/span> &lt;span class="nc">PytorchModelHandlerKeyedTensorWrapper&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">PytorchModelHandlerKeyedTensor&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">def&lt;/span> &lt;span class="nf">batch_elements_kwargs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="bp">self&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">{&lt;/span>&lt;span class="s1">&amp;#39;max_batch_size&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>An alternative aproach is to make all the tensors have the same length. This &lt;a href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb">example&lt;/a> shows how to do that.&lt;/p>
&lt;p>The &lt;code>ModelConfig&lt;/code> and &lt;code>ModelTokenizer&lt;/code> are loaded in the initialization function. The &lt;code>ModelConfig&lt;/code> is used to define the model architecture, and the &lt;code>ModelTokenizer&lt;/code> is used to tokenize the input data. The following two parameters are used for these tasks:&lt;/p>
&lt;ul>
&lt;li>&lt;code>model&lt;/code>: The name of the model that is used for inference. In this example it is &lt;code>bert-base-uncased&lt;/code>.&lt;/li>
&lt;li>&lt;code>model_path&lt;/code>: The path to the &lt;code>state_dict&lt;/code> of the model that is used for inference. In this example it is a path to a Google Cloud Storage bucket, where the &lt;code>state_dict&lt;/code> is stored.&lt;/li>
&lt;/ul>
&lt;p>Both of these parameters are specified in the Java &lt;code>PipelineOptions&lt;/code>.&lt;/p>
&lt;p>Finally, we postprocess the model predictions in the &lt;code>Postprocess&lt;/code> DoFn. The &lt;code>Postprocess&lt;/code> DoFn returns the original text, the last word of the sentence, and the predicted word.&lt;/p>
&lt;h3 id="compile-python-code-into-package">Compile Python code into package&lt;/h3>
&lt;p>The custom Python code needs to be written in a local package and be compiled as a tarball. This package can then be used by the Java pipeline. The following example shows how to compile the Python package into a tarball:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl"> pip install --upgrade build &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> python -m build --sdist
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In order to run this, a &lt;code>setup.py&lt;/code> is required. The path to the tarball will be used as an argument in the pipeline options of the Java pipeline.&lt;/p>
&lt;h3 id="run-the-java-pipeline">Run the Java pipeline&lt;/h3>
&lt;p>The Java pipeline is defined in the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/multi_language_inference/last_word_prediction/src/main/java/org/apache/beam/examples/MultiLangRunInference.java#L32">&lt;code>MultiLangRunInference&lt;/code>&lt;/a> class. In this pipeline, the data is read from Google Cloud Storage, the cross-language Python transform is applied, and the output is written back to Google Cloud Storage.&lt;/p>
&lt;p>The &lt;code>PythonExternalTransform&lt;/code> is used to inject the cross-language Python transform into the Java pipeline. &lt;code>PythonExternalTransform&lt;/code> takes a string parameter which is the fully qualified name of the Python transform.&lt;/p>
&lt;p>The &lt;code>withKwarg&lt;/code> method is used to specify the parameters that are needed for the Python transform. In this example the &lt;code>model&lt;/code> and &lt;code>model_path&lt;/code> parameters are specified. These parameters are used in the initialization function of the composite Python PTransform, as shown in the first section. Finally the &lt;code>withExtraPackages&lt;/code> method is used to specify the additional Python dependencies that are needed for the Python transform. In this example the &lt;code>local_packages&lt;/code> list is used, which contains Python requirements and the path to the compiled tarball.&lt;/p>
&lt;p>To run the pipeline, use the following command:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">mvn compile exec:java -Dexec.mainClass&lt;span class="o">=&lt;/span>org.apache.beam.examples.MultiLangRunInference &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -Dexec.args&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;--runner=DataflowRunner \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --project=&lt;/span>&lt;span class="nv">$GCP_PROJECT&lt;/span>&lt;span class="s2">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --region=&lt;/span>&lt;span class="nv">$GCP_REGION&lt;/span>&lt;span class="s2"> \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --gcpTempLocation=gs://&lt;/span>&lt;span class="nv">$GCP_BUCKET&lt;/span>&lt;span class="s2">/temp/ \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --inputFile=gs://&lt;/span>&lt;span class="nv">$GCP_BUCKET&lt;/span>&lt;span class="s2">/input/imdb_reviews.csv \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --outputFile=gs://&lt;/span>&lt;span class="nv">$GCP_BUCKET&lt;/span>&lt;span class="s2">/output/ouput.txt \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --modelPath=gs://&lt;/span>&lt;span class="nv">$GCP_BUCKET&lt;/span>&lt;span class="s2">/input/bert-model/bert-base-uncased.pth \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --modelName=&lt;/span>&lt;span class="nv">$MODEL_NAME&lt;/span>&lt;span class="s2"> \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> --localPackage=&lt;/span>&lt;span class="nv">$LOCAL_PACKAGE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> -Pdataflow-runner
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The standard Google Cloud and Runner parameters are specified. The &lt;code>inputFile&lt;/code> and &lt;code>outputFile&lt;/code> parameters are used to specify the input and output files. The &lt;code>modelPath&lt;/code> and &lt;code>modelName&lt;/code> custom parameters are passed to the &lt;code>PythonExternalTransform&lt;/code>. Finally the &lt;code>localPackage&lt;/code> parameter is used to specify the path to the compiled Python package, which contains the custom Python transform.&lt;/p>
&lt;h2 id="final-remarks">Final remarks&lt;/h2>
&lt;p>Use this example as a base to create other custom multi-language inference pipelines. You can also use other SDKs. For example, Go also has a wrapper that can make cross-language transforms. For more information, see &lt;a href="https://beam.apache.org/documentation/programming-guide/#1323-using-cross-language-transforms-in-a-go-pipeline">Using cross-language transforms in a Go pipeline&lt;/a> in the Apache Beam Programming Guide.&lt;/p>
&lt;p>The full code used in this example can be found on &lt;a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/multi_language_inference">GitHub&lt;/a>.&lt;/p></description></item><item><title>Documentation: Custom I/O patterns</title><link>/documentation/patterns/custom-io/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/custom-io/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="custom-io-patterns">Custom I/O patterns&lt;/h1>
&lt;p>This page describes common patterns in pipelines with &lt;a href="/documentation/io/developing-io-overview/">custom I/O connectors&lt;/a>. Custom I/O connectors connect pipelines to databases that aren&amp;rsquo;t supported by Beam&amp;rsquo;s &lt;a href="/documentation/io/connectors/">built-in I/O transforms&lt;/a>.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="choosing-between-built-in-and-custom-connectors">Choosing between built-in and custom connectors&lt;/h2>
&lt;p>&lt;a href="/documentation/io/connectors/">Built-in I/O connectors&lt;/a> are tested and hardened, so use them whenever possible. Only use custom I/O connectors when:&lt;/p>
&lt;ul>
&lt;li>No built-in options exist&lt;/li>
&lt;li>Your pipeline pulls in a small subset of source data&lt;/li>
&lt;/ul>
&lt;p>For instance, use a custom I/O connector to enrich pipeline elements with a small subset of source data. If you’re processing a sales order and adding information to each purchase, you can use a custom I/O connector to pull the small subset of data into your pipeline (instead of processing the entire source).&lt;/p>
&lt;p>Beam distributes work across many threads, so custom I/O connectors can increase your data source’s load average. You can reduce the load with the &lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.StartBundle.html">start&lt;/a>&lt;/span>&lt;span class="language-py">&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=bundle#apache_beam.transforms.core.DoFn.start_bundle">start&lt;/a>&lt;/span> and &lt;span class="language-java">&lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.FinishBundle.html">finish&lt;/a>&lt;/span>&lt;span class="language-py">&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=bundle#apache_beam.transforms.core.DoFn.finish_bundle">finish&lt;/a>&lt;/span> bundle annotations.&lt;/p></description></item><item><title>Documentation: Custom window patterns</title><link>/documentation/patterns/custom-windows/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/custom-windows/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="custom-window-patterns">Custom window patterns&lt;/h1>
&lt;p>The samples on this page demonstrate common custom window patterns. You can create custom windows with &lt;a href="/documentation/programming-guide/#provided-windowing-functions">&lt;code>WindowFn&lt;/code> functions&lt;/a>. For more information, see the &lt;a href="/documentation/programming-guide/#windowing">programming guide section on windowing&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Note&lt;/strong>: Custom merging windows isn&amp;rsquo;t supported in Python (with fnapi).&lt;/p>
&lt;h2 id="using-data-to-dynamically-set-session-window-gaps">Using data to dynamically set session window gaps&lt;/h2>
&lt;p>You can modify the &lt;a href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/windowing/SlidingWindows.html">&lt;code>assignWindows&lt;/code>&lt;/a> function to use data-driven gaps, then window incoming data into sessions.&lt;/p>
&lt;p>Access the &lt;code>assignWindows&lt;/code> function through &lt;code>WindowFn.AssignContext.element()&lt;/code>. The original, fixed-duration &lt;code>assignWindows&lt;/code> function is:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Collection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">IntervalWindow&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">assignWindows&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">WindowFn&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">AssignContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Assign each element into a window from its timestamp until gapDuration in the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// future. Overlapping windows (representing elements within gapDuration of
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// each other) will be merged.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="k">return&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">IntervalWindow&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timestamp&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="creating-data-driven-gaps">Creating data-driven gaps&lt;/h3>
&lt;p>To create data-driven gaps, add the following snippets to the &lt;code>assignWindows&lt;/code> function:&lt;/p>
&lt;ul>
&lt;li>A default value for when the custom gap is not present in the data&lt;/li>
&lt;li>A way to set the attribute from the main pipeline as a method of the custom windows&lt;/li>
&lt;/ul>
&lt;p>For example, the following function assigns each element to a window between the timestamp and &lt;code>gapDuration&lt;/code>:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="n">Collection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">IntervalWindow&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="nf">assignWindows&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">AssignContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Assign each element into a window from its timestamp until gapDuration in the
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// future. Overlapping windows (representing elements within gapDuration of
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// each other) will be merged.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">dataDrivenGap&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TableRow&lt;/span> &lt;span class="n">message&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">try&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dataDrivenGap&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">parseLong&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">message&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;gap&amp;#34;&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">toString&lt;/span>&lt;span class="o">()));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">catch&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">Exception&lt;/span> &lt;span class="n">e&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dataDrivenGap&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">IntervalWindow&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">timestamp&lt;/span>&lt;span class="o">(),&lt;/span> &lt;span class="n">dataDrivenGap&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Then, set the &lt;code>gapDuration&lt;/code> field in a windowing function:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="kd">class&lt;/span> &lt;span class="nc">DynamicSessions&lt;/span> &lt;span class="kd">extends&lt;/span> &lt;span class="n">WindowFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">TableRow&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">IntervalWindow&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Duration of the gaps between sessions. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">Duration&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="cm">/** Creates a {@code DynamicSessions} {@link WindowFn} with the specified gap duration. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">private&lt;/span> &lt;span class="nf">DynamicSessions&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">this&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">gapDuration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="windowing-messages-into-sessions">Windowing messages into sessions&lt;/h3>
&lt;p>After creating data-driven gaps, you can window incoming data into the new, custom sessions.&lt;/p>
&lt;p>First, set the session length to the gap duration:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="cm">/** Creates a {@code DynamicSessions} {@link WindowFn} with the specified gap duration. */&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">public&lt;/span> &lt;span class="kd">static&lt;/span> &lt;span class="n">DynamicSessions&lt;/span> &lt;span class="nf">withDefaultGapDuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span> &lt;span class="n">gapDuration&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">DynamicSessions&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">gapDuration&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>Lastly, window data into sessions in your pipeline:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;Window into sessions&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Window&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">TableRow&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">into&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">DynamicSessions&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">withDefaultGapDuration&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">10&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="example-data-and-windows">Example data and windows&lt;/h3>
&lt;p>The following test data tallies two users&amp;rsquo; scores with and without the &lt;code>gap&lt;/code> attribute:&lt;/p>
&lt;pre tabindex="0">&lt;code>.apply(&amp;#34;Create data&amp;#34;, Create.timestamped(
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-1\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;12\&amp;#34;,\&amp;#34;gap\&amp;#34;:\&amp;#34;5\&amp;#34;}&amp;#34;, new Instant()),
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-2\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;4\&amp;#34;}&amp;#34;, new Instant()),
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-1\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;-3\&amp;#34;,\&amp;#34;gap\&amp;#34;:\&amp;#34;5\&amp;#34;}&amp;#34;, new Instant().plus(2000)),
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-1\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;2\&amp;#34;,\&amp;#34;gap\&amp;#34;:\&amp;#34;5\&amp;#34;}&amp;#34;, new Instant().plus(9000)),
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-1\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;7\&amp;#34;,\&amp;#34;gap\&amp;#34;:\&amp;#34;5\&amp;#34;}&amp;#34;, new Instant().plus(12000)),
TimestampedValue.of(&amp;#34;{\&amp;#34;user\&amp;#34;:\&amp;#34;user-2\&amp;#34;,\&amp;#34;score\&amp;#34;:\&amp;#34;10\&amp;#34;}&amp;#34;, new Instant().plus(12000)))
.withCoder(StringUtf8Coder.of()))
&lt;/code>&lt;/pre>&lt;p>The diagram below visualizes the test data:&lt;/p>
&lt;p>&lt;img src="/images/standard-vs-dynamic-sessions.png" alt="Two sets of data and the standard and dynamic sessions with which the data is windowed.">&lt;/p>
&lt;h4 id="standard-sessions">Standard sessions&lt;/h4>
&lt;p>Standard sessions use the following windows and scores:&lt;/p>
&lt;pre tabindex="0">&lt;code>user=user-2, score=4, window=[2019-05-26T13:28:49.122Z..2019-05-26T13:28:59.122Z)
user=user-1, score=18, window=[2019-05-26T13:28:48.582Z..2019-05-26T13:29:12.774Z)
user=user-2, score=10, window=[2019-05-26T13:29:03.367Z..2019-05-26T13:29:13.367Z)
&lt;/code>&lt;/pre>&lt;p>User #1 sees two events separated by 12 seconds. With standard sessions, the gap defaults to 10 seconds; both scores are in different sessions, so the scores aren&amp;rsquo;t added.&lt;/p>
&lt;p>User #2 sees four events, separated by two, seven, and three seconds, respectively. Since none of the gaps are greater than the default, the four events are in the same standard session and added together (18 points).&lt;/p>
&lt;h4 id="dynamic-sessions">Dynamic sessions&lt;/h4>
&lt;p>The dynamic sessions specify a five-second gap, so they use the following windows and scores:&lt;/p>
&lt;pre tabindex="0">&lt;code>user=user-2, score=4, window=[2019-05-26T14:30:22.969Z..2019-05-26T14:30:32.969Z)
user=user-1, score=9, window=[2019-05-26T14:30:22.429Z..2019-05-26T14:30:30.553Z)
user=user-1, score=9, window=[2019-05-26T14:30:33.276Z..2019-05-26T14:30:41.849Z)
user=user-2, score=10, window=[2019-05-26T14:30:37.357Z..2019-05-26T14:30:47.357Z)
&lt;/code>&lt;/pre>&lt;p>With dynamic sessions, User #2 gets different scores. The third messages arrives seven seconds after the second message, so it&amp;rsquo;s grouped into a different session. The large, 18-point session is split into two 9-point sessions.&lt;/p></description></item><item><title>Documentation: Data exploration</title><link>/documentation/ml/data-processing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/ml/data-processing/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="data-exploration">Data exploration&lt;/h1>
&lt;p>Several types of Apache Beam data processing are applicable to AI/ML projects:&lt;/p>
&lt;ul>
&lt;li>Data exploration: Learn about your data (properties, distributions, statistics) when you start to deploy your project or when the data changes.&lt;/li>
&lt;li>Data preprocessing: Transform your data so that it is ready to be used to train your model.&lt;/li>
&lt;li>Data postprocessing: After running inference, you might need to transform the output of your model so that it is meaningful.&lt;/li>
&lt;li>Data validation: Check the quality of your data to detect outliers and calculate standard deviations and class distributions.&lt;/li>
&lt;/ul>
&lt;p>Data processing can be grouped into two main topics. This example first examimes data exploration and then data pipelines in ML that use both data preprocessing and validation. Data postprocessing is not covered because it is similar to prepressing. Postprocessing differs only in the order and type of pipeline.&lt;/p>
&lt;h2 id="initial-data-exploration">Initial data exploration&lt;/h2>
&lt;p>&lt;a href="https://pandas.pydata.org/">Pandas&lt;/a> is a popular tool for performing data exploration. Pandas is a data analysis and manipulation tool for Python. It uses DataFrames, which is a data structure that contains two-dimensional tabular data and that provides labeled rows and columns for the data. The Apache Beam Python SDK provides a &lt;a href="/documentation/dsls/dataframes/overview/">DataFrame API&lt;/a> for working with Pandas-like DataFrame objects.&lt;/p>
&lt;p>The Beam DataFrame API is intended to provide access to a familiar programming interface within an Apache Beam pipeline. This API allows you to perform data exploration. You can reuse the code for your data preprocessing pipeline. Using the DataFrame API, you can build complex data processing pipelines by invoking standard Pandas commands.&lt;/p>
&lt;p>You can use the DataFrame API in combination with the &lt;a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md">Beam interactive runner&lt;/a> in a &lt;a href="https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development">JupyterLab notebook&lt;/a>. Use the notebook to iteratively develop pipelines and display the results of your individual pipeline steps.&lt;/p>
&lt;p>The following is an example of data exploration in Apache Beam in a notebook:&lt;/p>
&lt;pre tabindex="0">&lt;code>import apache_beam as beam
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import apache_beam.runners.interactive.interactive_beam as ib
p = beam.Pipeline(InteractiveRunner())
beam_df = p | beam.dataframe.io.read_csv(input_path)
# Investigate columns and data types
beam_df.dtypes
# Generate descriptive statistics
ib.collect(beam_df.describe())
# Investigate missing values
ib.collect(beam_df.isnull())
&lt;/code>&lt;/pre>&lt;p>For a full end-to-end example that implements data exploration and data preprocessing with Apache Beam and the DataFrame API for your AI/ML project, see the &lt;a href="https://github.com/apache/beam/tree/master/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb">Beam Dataframe API tutorial for AI/ML&lt;/a>.&lt;/p>
&lt;h2 id="data-pipeline-for-ml">Data pipeline for ML&lt;/h2>
&lt;p>A typical data preprocessing pipeline consists of the following steps:&lt;/p>
&lt;ol>
&lt;li>Read and write data: Read and write the data from your file system, database, or messaging queue. Apache Beam has a rich set of &lt;a href="/documentation/io/built-in/">IO connectors&lt;/a> for ingesting and writing data.&lt;/li>
&lt;li>Data cleaning: Filter and clean your data before using it in your ML model. You might remove duplicate or irrelevant data, correct mistakes in your dataset, filter out unwanted outliers, or handle missing data.&lt;/li>
&lt;li>Data transformations: Your data needs to fit the expected input your model needs to train. You might need to normalize, one-hot encode, scale, or vectorize your data.&lt;/li>
&lt;li>Data enrichment: You might want to enrich your data with external data sources to make your data more meaningful or easier for an ML model to interpret. For example, you might want to transform a city name or address into a set of coordinates.&lt;/li>
&lt;li>Data validation and metrics: Make sure your data adheres to a specific set of requirements that can be validated in your pipeline. Report metrics from your data, such as the class distributions.&lt;/li>
&lt;/ol>
&lt;p>You can use an Apache Beam pipeline to implement all of these steps. This example shows a pipeline that demonstrates all of the steps previously mentioned:&lt;/p>
&lt;pre tabindex="0">&lt;code>import apache_beam as beam
from apache_beam.metrics import Metrics
with beam.Pipeline() as pipeline:
# Create data
input_data = (
pipeline
| beam.Create([
{&amp;#39;age&amp;#39;: 25, &amp;#39;height&amp;#39;: 176, &amp;#39;weight&amp;#39;: 60, &amp;#39;city&amp;#39;: &amp;#39;London&amp;#39;},
{&amp;#39;age&amp;#39;: 61, &amp;#39;height&amp;#39;: 192, &amp;#39;weight&amp;#39;: 95, &amp;#39;city&amp;#39;: &amp;#39;Brussels&amp;#39;},
{&amp;#39;age&amp;#39;: 48, &amp;#39;height&amp;#39;: 163, &amp;#39;weight&amp;#39;: None, &amp;#39;city&amp;#39;: &amp;#39;Berlin&amp;#39;}]))
# Clean data
def filter_missing_data(row):
return row[&amp;#39;weight&amp;#39;] is not None
cleaned_data = input_data | beam.Filter(filter_missing_data)
# Transform data
def scale_min_max_data(row):
row[&amp;#39;age&amp;#39;] = (row[&amp;#39;age&amp;#39;]/100)
row[&amp;#39;height&amp;#39;] = (row[&amp;#39;height&amp;#39;]-150)/50
row[&amp;#39;weight&amp;#39;] = (row[&amp;#39;weight&amp;#39;]-50)/50
yield row
transformed_data = cleaned_data | beam.FlatMap(scale_min_max_data)
# Enrich data
side_input = pipeline | beam.io.ReadFromText(&amp;#39;coordinates.csv&amp;#39;)
def coordinates_lookup(row, coordinates):
row[&amp;#39;coordinates&amp;#39;] = coordinates.get(row[&amp;#39;city&amp;#39;], (0, 0))
del row[&amp;#39;city&amp;#39;]
yield row
enriched_data = (
transformed_data
| beam.FlatMap(coordinates_lookup, coordinates=beam.pvalue.AsDict(side_input)))
# Metrics
counter = Metrics.counter(&amp;#39;main&amp;#39;, &amp;#39;counter&amp;#39;)
def count_data(row):
counter.inc()
yield row
output_data = enriched_data | beam.FlatMap(count_data)
# Write data
output_data | beam.io.WriteToText(&amp;#39;output.csv&amp;#39;)
&lt;/code>&lt;/pre></description></item><item><title>Documentation: Design Your Pipeline</title><link>/documentation/pipelines/design-your-pipeline/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/pipelines/design-your-pipeline/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="design-your-pipeline">Design Your Pipeline&lt;/h1>
&lt;nav id="TableOfContents">
&lt;ul>
&lt;li>&lt;a href="#what-to-consider-when-designing-your-pipeline">What to consider when designing your pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#a-basic-pipeline">A basic pipeline&lt;/a>&lt;/li>
&lt;li>&lt;a href="#branching-pcollections">Branching PCollections&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#multiple-transforms-process-the-same-pcollection">Multiple transforms process the same PCollection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#a-single-transform-that-produces-multiple-outputs">A single transform that produces multiple outputs&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#merging-pcollections">Merging PCollections&lt;/a>&lt;/li>
&lt;li>&lt;a href="#multiple-sources">Multiple sources&lt;/a>&lt;/li>
&lt;li>&lt;a href="#whats-next">What&amp;rsquo;s next&lt;/a>&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;p>This page helps you design your Apache Beam pipeline. It includes information about how to determine your pipeline&amp;rsquo;s structure, how to choose which transforms to apply to your data, and how to determine your input and output methods.&lt;/p>
&lt;p>Before reading this section, it is recommended that you become familiar with the information in the &lt;a href="/documentation/programming-guide">Beam programming guide&lt;/a>.&lt;/p>
&lt;h2 id="what-to-consider-when-designing-your-pipeline">What to consider when designing your pipeline&lt;/h2>
&lt;p>When designing your Beam pipeline, consider a few basic questions:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Where is your input data stored?&lt;/strong> How many sets of input data do you have? This will determine what kinds of &lt;code>Read&lt;/code> transforms you&amp;rsquo;ll need to apply at the start of your pipeline.&lt;/li>
&lt;li>&lt;strong>What does your data look like?&lt;/strong> It might be plaintext, formatted log files, or rows in a database table. Some Beam transforms work exclusively on &lt;code>PCollection&lt;/code>s of key/value pairs; you&amp;rsquo;ll need to determine if and how your data is keyed and how to best represent that in your pipeline&amp;rsquo;s &lt;code>PCollection&lt;/code>(s).&lt;/li>
&lt;li>&lt;strong>What do you want to do with your data?&lt;/strong> The core transforms in the Beam SDKs are general purpose. Knowing how you need to change or manipulate your data will determine how you build core transforms like &lt;a href="/documentation/programming-guide/#pardo">ParDo&lt;/a>, or when you use pre-written transforms included with the Beam SDKs.&lt;/li>
&lt;li>&lt;strong>What does your output data look like, and where should it go?&lt;/strong> This will determine what kinds of &lt;code>Write&lt;/code> transforms you&amp;rsquo;ll need to apply at the end of your pipeline.&lt;/li>
&lt;/ul>
&lt;h2 id="a-basic-pipeline">A basic pipeline&lt;/h2>
&lt;p>The simplest pipelines represent a linear flow of operations, as shown in figure 1.&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-linear.svg" alt="A linear pipeline starts with one input collection, sequentially appliesthree transforms, and ends with one output collection.">&lt;/p>
&lt;p>&lt;em>Figure 1: A linear pipeline.&lt;/em>&lt;/p>
&lt;p>However, your pipeline can be significantly more complex. A pipeline represents a &lt;a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph&lt;/a> of steps. It can have multiple input sources, multiple output sinks, and its operations (&lt;code>PTransform&lt;/code>s) can both read and output multiple &lt;code>PCollection&lt;/code>s. The following examples show some of the different shapes your pipeline can take.&lt;/p>
&lt;h2 id="branching-pcollections">Branching PCollections&lt;/h2>
&lt;p>It&amp;rsquo;s important to understand that transforms do not consume &lt;code>PCollection&lt;/code>s; instead, they consider each individual element of a &lt;code>PCollection&lt;/code> and create a new &lt;code>PCollection&lt;/code> as output. This way, you can do different things to different elements in the same &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;h3 id="multiple-transforms-process-the-same-pcollection">Multiple transforms process the same PCollection&lt;/h3>
&lt;p>You can use the same &lt;code>PCollection&lt;/code> as input for multiple transforms without consuming the input or altering it.&lt;/p>
&lt;p>The pipeline in figure 2 is a branching pipeline. The pipeline reads its input (first names represented as strings) from a database table and creates a &lt;code>PCollection&lt;/code> of table rows. Then, the pipeline applies multiple transforms to the &lt;strong>same&lt;/strong> &lt;code>PCollection&lt;/code>. Transform A extracts all the names in that &lt;code>PCollection&lt;/code> that start with the letter &amp;lsquo;A&amp;rsquo;, and Transform B extracts all the names in that &lt;code>PCollection&lt;/code> that start with the letter &amp;lsquo;B&amp;rsquo;. Both transforms A and B have the same input &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-multiple-pcollections.svg" alt="The pipeline applies two transforms to a single input collection. Eachtransform produces an output collection.">&lt;/p>
&lt;p>&lt;em>Figure 2: A branching pipeline. Two transforms are applied to a single
PCollection of database table rows.&lt;/em>&lt;/p>
&lt;p>The following example code applies two transforms to a single input collection.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">dbRowCollection&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="o">...;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">aCollection&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dbRowCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;aTrans&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">startsWith&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;A&amp;#34;&lt;/span>&lt;span class="o">)){&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">bCollection&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">dbRowCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;bTrans&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">startsWith&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;B&amp;#34;&lt;/span>&lt;span class="o">)){&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h3 id="a-single-transform-that-produces-multiple-outputs">A single transform that produces multiple outputs&lt;/h3>
&lt;p>Another way to branch a pipeline is to have a &lt;strong>single&lt;/strong> transform output to multiple &lt;code>PCollection&lt;/code>s by using &lt;a href="/documentation/programming-guide/#additional-outputs">tagged outputs&lt;/a>. Transforms that produce more than one output process each element of the input once, and output to zero or more &lt;code>PCollection&lt;/code>s.&lt;/p>
&lt;p>Figure 3 illustrates the same example described above, but with one transform that produces multiple outputs. Names that start with &amp;lsquo;A&amp;rsquo; are added to the main output &lt;code>PCollection&lt;/code>, and names that start with &amp;lsquo;B&amp;rsquo; are added to an additional output &lt;code>PCollection&lt;/code>.&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-additional-outputs.svg" alt="The pipeline applies one transform that produces multiple output collections.">&lt;/p>
&lt;p>&lt;em>Figure 3: A pipeline with a transform that outputs multiple PCollections.&lt;/em>&lt;/p>
&lt;p>If we compare the pipelines in figure 2 and figure 3, you can see they perform
the same operation in different ways. The pipeline in figure 2 contains two
transforms that process the elements in the same input &lt;code>PCollection&lt;/code>. One
transform uses the following logic:&lt;/p>
&lt;pre>if (starts with 'A') { outputToPCollectionA }&lt;/pre>
&lt;p>while the other transform uses:&lt;/p>
&lt;pre>if (starts with 'B') { outputToPCollectionB }&lt;/pre>
&lt;p>Because each transform reads the entire input &lt;code>PCollection&lt;/code>, each element in the input &lt;code>PCollection&lt;/code> is processed twice.&lt;/p>
&lt;p>The pipeline in figure 3 performs the same operation in a different way - with only one transform that uses the following logic:&lt;/p>
&lt;pre>if (starts with 'A') { outputToPCollectionA } else if (starts with 'B') { outputToPCollectionB }&lt;/pre>
&lt;p>where each element in the input &lt;code>PCollection&lt;/code> is processed once.&lt;/p>
&lt;p>The following example code applies one transform that processes each element
once and outputs two collections.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Define two TupleTags, one for each output.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">startsWithATag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">startsWithBTag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;(){};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollectionTuple&lt;/span> &lt;span class="n">mixedCollection&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dbRowCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ParDo&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">processElement&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">ProcessContext&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">startsWith&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;A&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Emit to main output, which is the output with tag startsWithATag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span> &lt;span class="k">else&lt;/span> &lt;span class="k">if&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">startsWith&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;B&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Emit to output with tag startsWithBTag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">output&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startsWithBTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">c&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">element&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">})&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Specify main output. In this example, it is the output
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// with tag startsWithATag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">withOutputTags&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startsWithATag&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Specify the output with tag startsWithBTag, as a TupleTagList.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">TupleTagList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startsWithBTag&lt;/span>&lt;span class="o">)));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Get subset of the output with tag startsWithATag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">mixedCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startsWithATag&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Get subset of the output with tag startsWithBTag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">mixedCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">get&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">startsWithBTag&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>You can use either mechanism to produce multiple output &lt;code>PCollection&lt;/code>s. The first option is recommended if it logically does not make sense to combine the processing logic into one &lt;code>ParDo&lt;/code>. However, using the second option (a single transform that produces multiple outputs) makes more sense if the transform&amp;rsquo;s computation per element is time-consuming, and is more scalable if you plan to add more output types in the future.&lt;/p>
&lt;h2 id="merging-pcollections">Merging PCollections&lt;/h2>
&lt;p>Often, after you&amp;rsquo;ve branched your &lt;code>PCollection&lt;/code> into multiple &lt;code>PCollection&lt;/code>s via multiple transforms, you&amp;rsquo;ll want to merge some or all of those resulting &lt;code>PCollection&lt;/code>s back together. You can do so by using one of the following:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Flatten&lt;/strong> - You can use the &lt;code>Flatten&lt;/code> transform in the Beam SDKs to merge multiple &lt;code>PCollection&lt;/code>s of the &lt;strong>same type&lt;/strong>.&lt;/li>
&lt;li>&lt;strong>Join&lt;/strong> - You can use the &lt;code>CoGroupByKey&lt;/code> transform in the Beam SDK to perform a relational join between two &lt;code>PCollection&lt;/code>s. The &lt;code>PCollection&lt;/code>s must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type.&lt;/li>
&lt;/ul>
&lt;p>The example in figure 4 is a continuation of the example in figure 2 in &lt;a href="#multiple-transforms-process-the-same-pcollection">the
section above&lt;/a>. After
branching into two &lt;code>PCollection&lt;/code>s, one with names that begin with &amp;lsquo;A&amp;rsquo; and one
with names that begin with &amp;lsquo;B&amp;rsquo;, the pipeline merges the two together into a
single &lt;code>PCollection&lt;/code> that now contains all names that begin with either &amp;lsquo;A&amp;rsquo; or
&amp;lsquo;B&amp;rsquo;. Here, it makes sense to use &lt;code>Flatten&lt;/code> because the &lt;code>PCollection&lt;/code>s being
merged both contain the same type.&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-flatten.svg" alt="The pipeline merges two collections into one collection with the Flatten transform.">&lt;/p>
&lt;p>&lt;em>Figure 4: A pipeline that merges two collections into one collection with the Flatten transform.&lt;/em>&lt;/p>
&lt;p>The following example code applies &lt;code>Flatten&lt;/code> to merge two collections.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">//merge the two PCollections with Flatten
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollectionList&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">collectionList&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">PCollectionList&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">aCollection&lt;/span>&lt;span class="o">).&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">bCollection&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">mergedCollectionWithFlatten&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">collectionList&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Flatten&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">pCollections&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// continue with the new merged PCollection
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">mergedCollectionWithFlatten&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="multiple-sources">Multiple sources&lt;/h2>
&lt;p>Your pipeline can read its input from one or more sources. If your pipeline reads from multiple sources and the data from those sources is related, it can be useful to join the inputs together. In the example illustrated in figure 5 below, the pipeline reads names and addresses from a database table, and names and order numbers from a Kafka topic. The pipeline then uses &lt;code>CoGroupByKey&lt;/code> to join this information, where the key is the name; the resulting &lt;code>PCollection&lt;/code> contains all the combinations of names, addresses, and orders.&lt;/p>
&lt;p>&lt;img src="/images/design-your-pipeline-join.svg" alt="The pipeline joins two input collections into one collection with the Join transform.">&lt;/p>
&lt;p>&lt;em>Figure 5: A pipeline that does a relational join of two input collections.&lt;/em>&lt;/p>
&lt;p>The following example code applies &lt;code>Join&lt;/code> to join two input collections.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">userAddress&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">JdbcIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()...);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">userOrder&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">KafkaIO&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">read&lt;/span>&lt;span class="o">()...);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">addressTag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">final&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">orderTag&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">TupleTag&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Merge collection values into a CoGbkResult collection.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">KV&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">CoGbkResult&lt;/span>&lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">joinedCollection&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">KeyedPCollectionTuple&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">addressTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">userAddress&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">and&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">orderTag&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">userOrder&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">CoGroupByKey&lt;/span>&lt;span class="o">.&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span>&lt;span class="n">create&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">joinedCollection&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(...);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;h2 id="whats-next">What&amp;rsquo;s next&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/pipelines/create-your-pipeline">Create your own pipeline&lt;/a>.&lt;/li>
&lt;li>&lt;a href="/documentation/pipelines/test-your-pipeline">Test your pipeline&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Distinct</title><link>/documentation/transforms/java/aggregation/distinct/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/aggregation/distinct/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="distinct">Distinct&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Distinct.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Produces a collection containing distinct elements of the input collection.&lt;/p>
&lt;p>On some data sets, it might be more efficient to compute an approximate
answer using &lt;code>ApproximateUnique&lt;/code>, which also allows for determining distinct
values for each key.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example 1&lt;/strong>: Find the distinct element from a &lt;code>PCollection&lt;/code> of &lt;code>String&lt;/code>.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">[]&lt;/span> &lt;span class="n">WORDS_ARRAY&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">new&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">[]{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;hi&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;hi&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;sue&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s">&amp;#34;sue&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;bob&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kd">static&lt;/span> &lt;span class="kd">final&lt;/span> &lt;span class="n">List&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">WORDS&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Arrays&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">asList&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">WORDS_ARRAY&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">input&lt;/span> &lt;span class="o">=&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">WORDS&lt;/span>&lt;span class="o">)).&lt;/span>&lt;span class="na">withCoder&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">StringUtf8Coder&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">distinctWords&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Distinct&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">create&lt;/span>&lt;span class="o">());&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Example 2&lt;/strong>: Find the distinct element from a &lt;code>PCollection&lt;/code> of &lt;code>Integer&lt;/code>.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_Distinct"
data-show="main_section"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_Distinct%22%2c%22sdk%22%3a%22java%22%2c%22show%22%3a%22main_section%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/count">Count&lt;/a>
counts the number of elements within each aggregation.&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/aggregation/approximateunique">ApproximateUnique&lt;/a>
estimates the number of distinct elements in a collection.&lt;/li>
&lt;/ul></description></item><item><title>Documentation: Distinct</title><link>/documentation/transforms/python/aggregation/distinct/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/aggregation/distinct/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="distinct">Distinct&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.Distinct"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;p>Produces a collection containing distinct elements of the input collection.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>In the following example, we create a pipeline with two &lt;code>PCollection&lt;/code>s of produce.&lt;/p>
&lt;p>We use &lt;code>Distinct&lt;/code> to get rid of duplicate elements, which outputs a &lt;code>PCollection&lt;/code> of all the unique elements.&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-py playground-snippet"
data-sdk="python"
data-path="SDK_PYTHON_Distinct"
data-show="distinct"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_PYTHON_Distinct%22%2c%22sdk%22%3a%22python%22%2c%22show%22%3a%22distinct%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/python/aggregation/count">Count&lt;/a> counts the number of elements within each aggregation.&lt;/li>
&lt;/ul>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.Distinct"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Enrichment</title><link>/documentation/transforms/python/elementwise/enrichment/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/elementwise/enrichment/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="enrichment-transform">Enrichment transform&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table>
&lt;tr>
&lt;td>
&lt;a>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>The enrichment transform lets you dynamically enrich data in a pipeline by doing a key-value lookup to a remote service. The transform uses &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.requestresponseio.html#apache_beam.io.requestresponseio.RequestResponseIO">&lt;code>RequestResponeIO&lt;/code>&lt;/a> internally. This feature uses client-side throttling to ensure that the remote service isn&amp;rsquo;t overloaded with requests. If service-side errors occur, like &lt;code>TooManyRequests&lt;/code> and &lt;code>Timeout&lt;/code> exceptions, it retries the requests by using exponential backoff.&lt;/p>
&lt;p>This transform is available in Apache Beam 2.54.0 and later versions.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>The following examples demonstrate how to create a pipeline that use the enrichment transform to enrich data from external services.&lt;/p>
&lt;div class="table-wrapper">&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">Service&lt;/th>
&lt;th style="text-align:left">Example&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Cloud Bigtable&lt;/td>
&lt;td style="text-align:left">&lt;a href="/documentation/transforms/python/elementwise/enrichment-bigtable/#example">Enrichment with Bigtable&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Vertex AI Feature Store&lt;/td>
&lt;td style="text-align:left">&lt;a href="/documentation/transforms/python/elementwise/enrichment-vertexai/#example-1-enrichment-with-vertex-ai-feature-store">Enrichment with Vertex AI Feature Store&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Vertex AI Feature Store (Legacy)&lt;/td>
&lt;td style="text-align:left">&lt;a href="/documentation/transforms/python/elementwise/enrichment-vertexai/#example-2-enrichment-with-vertex-ai-feature-store-legacy">Enrichment with Legacy Vertex AI Feature Store&lt;/a>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>Not applicable.&lt;/p>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Enrichment with Bigtable</title><link>/documentation/transforms/python/elementwise/enrichment-bigtable/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/elementwise/enrichment-bigtable/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="use-bigtable-to-enrich-data">Use Bigtable to enrich data&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table>
&lt;tr>
&lt;td>
&lt;a>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>In Apache Beam 2.54.0 and later versions, the enrichment transform includes a built-in enrichment handler for &lt;a href="https://cloud.google.com/bigtable/docs/overview">Bigtable&lt;/a>.
The following example demonstrates how to create a pipeline that use the enrichment transform with the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler">&lt;code>BigTableEnrichmentHandler&lt;/code>&lt;/a> handler.&lt;/p>
&lt;p>The data stored in the Bigtable cluster uses the following format:&lt;/p>
&lt;div class="table-wrapper">&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">Row key&lt;/th>
&lt;th style="text-align:center">product:product_id&lt;/th>
&lt;th style="text-align:center">product:product_name&lt;/th>
&lt;th style="text-align:center">product:product_stock&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">pixel 5&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">2&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;td style="text-align:center">pixel 6&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">3&lt;/td>
&lt;td style="text-align:center">3&lt;/td>
&lt;td style="text-align:center">pixel 7&lt;/td>
&lt;td style="text-align:center">20&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">4&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;td style="text-align:center">pixel 8&lt;/td>
&lt;td style="text-align:center">10&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Enrichment&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment_handlers.bigtable&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">BigTableEnrichmentHandler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">project_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;apache-beam-testing&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">instance_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;beam-test&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">table_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;bigtable-enrichment-test&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">row_key&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;product_id&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sale_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">customer_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">quantity&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sale_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">customer_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">quantity&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sale_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">customer_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">quantity&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">bigtable_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">BigTableEnrichmentHandler&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">project_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">project_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">instance_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">instance_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">table_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">table_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">row_key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">row_key&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Create&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Enrich W/ BigTable&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">Enrichment&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bigtable_handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Print&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>
&lt;p class="notebook-skip">Output:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Row(sale_id=1, customer_id=1, product_id=1, quantity=1, product={&amp;#39;product_id&amp;#39;: &amp;#39;1&amp;#39;, &amp;#39;product_name&amp;#39;: &amp;#39;pixel 5&amp;#39;, &amp;#39;product_stock&amp;#39;: &amp;#39;2&amp;#39;})
Row(sale_id=3, customer_id=3, product_id=2, quantity=3, product={&amp;#39;product_id&amp;#39;: &amp;#39;2&amp;#39;, &amp;#39;product_name&amp;#39;: &amp;#39;pixel 6&amp;#39;, &amp;#39;product_stock&amp;#39;: &amp;#39;4&amp;#39;})
Row(sale_id=5, customer_id=5, product_id=4, quantity=2, product={&amp;#39;product_id&amp;#39;: &amp;#39;4&amp;#39;, &amp;#39;product_name&amp;#39;: &amp;#39;pixel 8&amp;#39;, &amp;#39;product_stock&amp;#39;: &amp;#39;10&amp;#39;})&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>Not applicable.&lt;/p>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Enrichment with Vertex AI Feature Store</title><link>/documentation/transforms/python/elementwise/enrichment-vertexai/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/python/elementwise/enrichment-vertexai/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="enrichment-with-google-cloud-vertex-ai-feature-store">Enrichment with Google Cloud Vertex AI Feature Store&lt;/h1>
&lt;script type="text/javascript">
localStorage.setItem("language", "language-py")
&lt;/script>
&lt;table>
&lt;tr>
&lt;td>
&lt;a>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.html#apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.VertexAIFeatureStoreEnrichmentHandler"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p>
&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/table>
&lt;p>In Apache Beam 2.55.0 and later versions, the enrichment transform includes a built-in enrichment handler for &lt;a href="https://cloud.google.com/vertex-ai/docs/featurestore">Vertex AI Feature Store&lt;/a>.
The following example demonstrates how to create a pipeline that use the enrichment transform with the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.html#apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.VertexAIFeatureStoreEnrichmentHandler">&lt;code>VertexAIFeatureStoreEnrichmentHandler&lt;/code>&lt;/a> handler and the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.html#apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.VertexAIFeatureStoreLegacyEnrichmentHandler">&lt;code>VertexAIFeatureStoreLegacyEnrichmentHandler&lt;/code>&lt;/a> handler.&lt;/p>
&lt;h2 id="example-1-enrichment-with-vertex-ai-feature-store">Example 1: Enrichment with Vertex AI Feature Store&lt;/h2>
&lt;p>The precomputed feature values stored in Vertex AI Feature Store uses the following format:&lt;/p>
&lt;div class="table-wrapper">&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">user_id&lt;/th>
&lt;th style="text-align:center">age&lt;/th>
&lt;th style="text-align:center">gender&lt;/th>
&lt;th style="text-align:center">state&lt;/th>
&lt;th style="text-align:center">country&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">21422&lt;/td>
&lt;td style="text-align:center">12&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">2963&lt;/td>
&lt;td style="text-align:center">12&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">20592&lt;/td>
&lt;td style="text-align:center">12&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">76538&lt;/td>
&lt;td style="text-align:center">12&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">3&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Enrichment&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store&lt;/span> \
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kn">import&lt;/span> &lt;span class="nn">VertexAIFeatureStoreEnrichmentHandler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">project_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;apache-beam-testing&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">location&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;us-central1&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">api_endpoint&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">location&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">-aiplatform.googleapis.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;2963&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">14235&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sale_price&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">15.0&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;21422&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">11203&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sale_price&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">12.0&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">user_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;20592&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">product_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">8579&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">sale_price&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">9.0&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">vertex_ai_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">VertexAIFeatureStoreEnrichmentHandler&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">project&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">project_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">location&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">location&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">api_endpoint&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">api_endpoint&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_store_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;vertexai_enrichment_example&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_view_name&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;users&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">row_key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;user_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Create&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Enrich W/ Vertex AI&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">Enrichment&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">vertex_ai_handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Print&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>
&lt;p class="notebook-skip">Output:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Row(user_id=&amp;#39;2963&amp;#39;, product_id=14235, sale_price=15.0, age=12.0, state=&amp;#39;1&amp;#39;, gender=&amp;#39;1&amp;#39;, country=&amp;#39;1&amp;#39;)
Row(user_id=&amp;#39;21422&amp;#39;, product_id=11203, sale_price=12.0, age=12.0, state=&amp;#39;0&amp;#39;, gender=&amp;#39;0&amp;#39;, country=&amp;#39;0&amp;#39;)
Row(user_id=&amp;#39;20592&amp;#39;, product_id=8579, sale_price=9.0, age=12.0, state=&amp;#39;2&amp;#39;, gender=&amp;#39;1&amp;#39;, country=&amp;#39;2&amp;#39;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="example-2-enrichment-with-vertex-ai-feature-store-legacy">Example 2: Enrichment with Vertex AI Feature Store (legacy)&lt;/h2>
&lt;p>The precomputed feature values stored in Vertex AI Feature Store (Legacy) use the following format:&lt;/p>
&lt;div class="table-wrapper">&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">entity_id&lt;/th>
&lt;th style="text-align:left">title&lt;/th>
&lt;th style="text-align:left">genres&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">movie_01&lt;/td>
&lt;td style="text-align:left">The Shawshank Redemption&lt;/td>
&lt;td style="text-align:left">Drama&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">movie_02&lt;/td>
&lt;td style="text-align:left">The Shining&lt;/td>
&lt;td style="text-align:left">Horror&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">movie_04&lt;/td>
&lt;td style="text-align:left">The Dark Knight&lt;/td>
&lt;td style="text-align:left">Action&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">apache_beam&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">beam&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">Enrichment&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store&lt;/span> \
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kn">import&lt;/span> &lt;span class="nn">VertexAIFeatureStoreLegacyEnrichmentHandler&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">project_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;apache-beam-testing&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">location&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s1">&amp;#39;us-central1&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">api_endpoint&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">location&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">-aiplatform.googleapis.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">entity_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;movie_01&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">title&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The Shawshank Redemption&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">entity_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;movie_02&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">title&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;The Shining&amp;#34;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Row&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">entity_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;movie_04&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">title&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;The Dark Knight&amp;#39;&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">vertex_ai_handler&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">VertexAIFeatureStoreLegacyEnrichmentHandler&lt;/span>&lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">project&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">project_id&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">location&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">location&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">api_endpoint&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">api_endpoint&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">entity_type_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;movies&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_store_id&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;movie_prediction_unique&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_ids&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;title&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;genres&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">row_key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;entity_id&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">p&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">_&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">p&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Create&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Create&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">data&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Enrich W/ Vertex AI&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">Enrichment&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">vertex_ai_handler&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="s2">&amp;#34;Print&amp;#34;&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">print&lt;/span>&lt;span class="p">))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p>
&lt;p class="notebook-skip">Output:&lt;/p>
&lt;div class="snippet">
&lt;div class="notebook-skip code-snippet without_switcher">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;pre tabindex="0">&lt;code>Row(entity_id=&amp;#39;movie_01&amp;#39;, title=&amp;#39;The Shawshank Redemption&amp;#39;, genres=&amp;#39;Drama&amp;#39;)
Row(entity_id=&amp;#39;movie_02&amp;#39;, title=&amp;#39;The Shining&amp;#39;, genres=&amp;#39;Horror&amp;#39;)
Row(entity_id=&amp;#39;movie_04&amp;#39;, title=&amp;#39;The Dark Knight&amp;#39;, genres=&amp;#39;Action&amp;#39;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;p>Not applicable.&lt;/p>
&lt;table align="left" style="margin-right:1em">
&lt;td>
&lt;a
class="button"
target="_blank"
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.html#apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.VertexAIFeatureStoreEnrichmentHandler"
>&lt;img
src="https://beam.apache.org/images/logos/sdks/python.png"
width="32px"
height="32px"
alt="Pydoc"
/>
Pydoc&lt;/a
>
&lt;/td>
&lt;/table>
&lt;p>&lt;br>&lt;br>&lt;br>&lt;/p></description></item><item><title>Documentation: Execution model</title><link>/documentation/runtime/model/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/runtime/model/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="execution-model">Execution model&lt;/h1>
&lt;p>The Beam model allows runners to execute your pipeline in different ways. You
may observe various effects as a result of the runner’s choices. This page
describes these effects so you can better understand how Beam pipelines execute.&lt;/p>
&lt;h2 id="processing-of-elements">Processing of elements&lt;/h2>
&lt;p>The serialization and communication of elements between machines is one of the
most expensive operations in a distributed execution of your pipeline. Avoiding
this serialization may require re-processing elements after failures or may
limit the distribution of output to other machines.&lt;/p>
&lt;h3 id="serialization-and-communication">Serialization and communication&lt;/h3>
&lt;p>The runner might serialize elements between machines for communication purposes
and for other reasons such as persistence.&lt;/p>
&lt;p>A runner may decide to transfer elements between transforms in a variety of
ways, such as:&lt;/p>
&lt;ul>
&lt;li>Routing elements to a worker for processing as part of a grouping operation.
This may involve serializing elements and grouping or sorting them by their
key.&lt;/li>
&lt;li>Redistributing elements between workers to adjust parallelism. This may
involve serializing elements and communicating them to other workers.&lt;/li>
&lt;li>Using the elements in a side input to a &lt;code>ParDo&lt;/code>. This may require
serializing the elements and broadcasting them to all the workers executing
the &lt;code>ParDo&lt;/code>.&lt;/li>
&lt;li>Passing elements between transforms that are running on the same worker.
This may allow the runner to avoid serializing elements; instead, the runner
can just pass the elements in memory. This is done as part of an
optimization that is known as
&lt;a href="/documentation/glossary/#fusion">fusion&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Some situations where the runner may serialize and persist elements are:&lt;/p>
&lt;ol>
&lt;li>When used as part of a stateful &lt;code>DoFn&lt;/code>, the runner may persist values to some
state mechanism.&lt;/li>
&lt;li>When committing the results of processing, the runner may persist the outputs
as a checkpoint.&lt;/li>
&lt;/ol>
&lt;h3 id="bundling-and-persistence">Bundling and persistence&lt;/h3>
&lt;p>Beam pipelines often focus on &amp;ldquo;&lt;a href="https://en.wikipedia.org/wiki/embarrassingly_parallel">embarassingly parallel&lt;/a>&amp;rdquo;
problems. Because of this, the APIs emphasize processing elements in parallel,
which makes it difficult to express actions like &amp;ldquo;assign a sequence number to
each element in a PCollection&amp;rdquo;. This is intentional as such algorithms are much
more likely to suffer from scalability problems.&lt;/p>
&lt;p>Processing all elements in parallel also has some drawbacks. Specifically, it
makes it impossible to batch any operations, such as writing elements to a sink
or checkpointing progress during processing.&lt;/p>
&lt;p>Instead of processing all elements simultaneously, the elements in a
&lt;code>PCollection&lt;/code> are processed in &lt;em>bundles&lt;/em>. The division of the collection into
bundles is arbitrary and selected by the runner. This allows the runner to
choose an appropriate middle-ground between persisting results after every
element, and having to retry everything if there is a failure. For example, a
streaming runner may prefer to process and commit small bundles, and a batch
runner may prefer to process larger bundles.&lt;/p>
&lt;h3 id="data-partitioning-and-inter-stage-execution">Data partitioning and inter-stage execution&lt;/h3>
&lt;p>Partitioning and parallelization of element processing within a Beam pipeline is
dependent on two things:&lt;/p>
&lt;ul>
&lt;li>Data source implementation&lt;/li>
&lt;li>Inter-stage key parallelism&lt;/li>
&lt;/ul>
&lt;p>Beam pipelines read data from a source (e.g. &lt;code>KafkaIO&lt;/code>, &lt;code>BigQueryIO&lt;/code>, &lt;code>JdbcIO&lt;/code>,
or your own source implementation). To implement a Source in Beam one must
implement it as a Splittable &lt;code>DoFn&lt;/code>. A Splittable &lt;code>DoFn&lt;/code> provides the runner
with interfaces to facilitate the splitting of work.&lt;/p>
&lt;p>When running key-based operations in Beam (e.g. &lt;code>GroupByKey&lt;/code>, &lt;code>Combine&lt;/code>,
&lt;code>Reshuffle.perKey&lt;/code>, and stateful &lt;code>DoFn&lt;/code>s), Beam runners perform serialization
and transfer of data known as &lt;em>shuffle&lt;/em>&lt;sup>1&lt;/sup>. Shuffle allows data
elements of the same key to be processed together.&lt;/p>
&lt;p>The way in which runners &lt;em>shuffle&lt;/em> data may be slightly different for Batch and
Streaming execution modes.&lt;/p>
&lt;p>&lt;sup>1&lt;/sup>Not to be confused with the &lt;code>shuffle&lt;/code> operation in some runners.&lt;/p>
&lt;h4 id="data-ordering-in-a-pipeline-execution">Data ordering in a pipeline execution&lt;/h4>
&lt;p>The Beam model does not define strict guidelines regarding the order in which
runners process elements or transport them across &lt;code>PTransforms&lt;/code>. Runners are
free to implement data transfer semantics in different forms.&lt;/p>
&lt;p>Some use cases exist where user pipelines may need to rely on specific ordering
semantics in pipeline execution. The &lt;a href="/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html">capability matrix documents&lt;/a>
runner behavior for &lt;strong>key-ordered delivery&lt;/strong>.&lt;/p>
&lt;p>Consider a single Beam worker processing a series of bundles from the same Beam
transform, and consider a &lt;code>PTransform&lt;/code> that outputs data from this Stage into a
downstream &lt;code>PCollection&lt;/code>. Finally, consider two events &lt;em>with the same key&lt;/em>
emitted in a certain order by this worker (within the same bundle or as part of
different bundles).&lt;/p>
&lt;p>We say that the Beam runner supports &lt;strong>key-ordered delivery&lt;/strong> if it guarantees
that these two events will be observed in the same order by a PTransform that is
immediately downstream independently of the kind of data transmission method.&lt;/p>
&lt;p>This characteristic will hold true in runners and operations that have
key-limited parallelism.&lt;/p>
&lt;h2 id="parallelism">Failures and parallelism within and between transforms&lt;/h2>
&lt;p>In this section, we discuss how elements in the input collection are processed
in parallel, and how transforms are retried when failures occur.&lt;/p>
&lt;h3 id="data-parallelism">Data-parallelism within one transform&lt;/h3>
&lt;p>When executing a single &lt;code>ParDo&lt;/code>, a runner might divide an example input
collection of nine elements into two bundles as shown in figure 1.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling.svg" alt="Bundle A contains five elements. Bundle B contains four elements.">&lt;/p>
&lt;p>&lt;em>Figure 1: A runner divides an input collection into two bundles.&lt;/em>&lt;/p>
&lt;p>When the &lt;code>ParDo&lt;/code> executes, workers may process the two bundles in parallel as
shown in figure 2.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling_gantt.svg" alt="Two workers process the two bundles in parallel. Worker one processes bundle A. Worker two processes bundle B.">&lt;/p>
&lt;p>&lt;em>Figure 2: Two workers process the two bundles in parallel.&lt;/em>&lt;/p>
&lt;p>Since elements cannot be split, the maximum parallelism for a transform depends
on the number of elements in the collection. In figure 3, the input collection
has nine elements, so the maximum parallelism is nine.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling_gantt_max.svg" alt="Nine workers process a nine element input collection in parallel.">&lt;/p>
&lt;p>&lt;em>Figure 3: Nine workers process a nine element input collection in parallel.&lt;/em>&lt;/p>
&lt;p>Note: Splittable ParDo allows splitting the processing of a single input across
multiple bundles. This feature is a work in progress.&lt;/p>
&lt;h3 id="dependent-parallellism">Dependent-parallelism between transforms&lt;/h3>
&lt;p>&lt;code>ParDo&lt;/code> transforms that are in sequence may be &lt;em>dependently parallel&lt;/em> if the
runner chooses to execute the consuming transform on the producing transform&amp;rsquo;s
output elements without altering the bundling. In figure 4, &lt;code>ParDo1&lt;/code> and
&lt;code>ParDo2&lt;/code> are &lt;em>dependently parallel&lt;/em> if the output of &lt;code>ParDo1&lt;/code> for a given
element must be processed on the same worker.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling_multi.svg" alt="ParDo1 processes an input collection that contains bundles A and B. ParDo2 then processes the output collection from ParDo1, which contains bundles C and D.">&lt;/p>
&lt;p>&lt;em>Figure 4: Two transforms in sequence and their corresponding input collections.&lt;/em>&lt;/p>
&lt;p>Figure 5 shows how these dependently parallel transforms might execute. The
first worker executes &lt;code>ParDo1&lt;/code> on the elements in bundle A (which results in
bundle C), and then executes &lt;code>ParDo2&lt;/code> on the elements in bundle C. Similarly,
the second worker executes &lt;code>ParDo1&lt;/code> on the elements in bundle B (which results
in bundle D), and then executes &lt;code>ParDo2&lt;/code> on the elements in bundle D.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling_multi_gantt.svg" alt="Worker one executes ParDo1 on bundle A and Pardo2 on bundle C. Worker two executes ParDo1 on bundle B and ParDo2 on bundle D.">&lt;/p>
&lt;p>&lt;em>Figure 5: Two workers execute dependently parallel ParDo transforms.&lt;/em>&lt;/p>
&lt;p>Executing transforms this way allows a runner to avoid redistributing elements
between workers, which saves on communication costs. However, the maximum parallelism
now depends on the maximum parallelism of the first of the dependently parallel
steps.&lt;/p>
&lt;h3 id="failures-within-one-transform">Failures within one transform&lt;/h3>
&lt;p>If processing of an element within a bundle fails, the entire bundle fails. The
elements in the bundle must be retried (otherwise the entire pipeline fails),
although they do not need to be retried with the same bundling.&lt;/p>
&lt;p>For this example, we will use the &lt;code>ParDo&lt;/code> from figure 1 that has an input
collection with nine elements and is divided into two bundles.&lt;/p>
&lt;p>In figure 6, the first worker successfully processes all five elements in bundle
A. The second worker processes the four elements in bundle B: the first two
elements were successfully processed, the third element’s processing failed, and
there is one element still awaiting processing.&lt;/p>
&lt;p>We see that the runner retries all elements in bundle B and the processing
completes successfully the second time. Note that the retry does not necessarily
happen on the same worker as the original processing attempt, as shown in the
figure.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_failure_retry.svg" alt="Worker two fails to process an element in bundle B. Worker one finishes processing bundle A and then successfully retries to execute bundle B.">&lt;/p>
&lt;p>&lt;em>Figure 6: The processing of an element within bundle B fails, and another worker
retries the entire bundle.&lt;/em>&lt;/p>
&lt;p>Because we encountered a failure while processing an element in the input
bundle, we had to reprocess &lt;em>all&lt;/em> of the elements in the input bundle. This means
the runner must throw away the entire output of the bundle (including any state
mutations and set timers) since all of the results it contains will be recomputed.&lt;/p>
&lt;p>Note that if the failed transform is a &lt;code>ParDo&lt;/code>, then the &lt;code>DoFn&lt;/code> instance is torn
down and abandoned.&lt;/p>
&lt;h3 id="coupled-failure">Coupled failure: Failures between transforms&lt;/h3>
&lt;p>If a failure to process an element in &lt;code>ParDo2&lt;/code> causes &lt;code>ParDo1&lt;/code> to re-execute,
these two steps are said to be &lt;em>co-failing&lt;/em>.&lt;/p>
&lt;p>For this example, we will use the two &lt;code>ParDo&lt;/code>s from figure 4.&lt;/p>
&lt;p>In figure 7, worker two successfully executes &lt;code>ParDo1&lt;/code> on all elements in bundle
B. However, the worker fails to process an element in bundle D, so &lt;code>ParDo2&lt;/code>
fails (shown as the red X). As a result, the runner must discard and recompute
the output of &lt;code>ParDo2&lt;/code>. Because the runner was executing &lt;code>ParDo1&lt;/code> and &lt;code>ParDo2&lt;/code>
together, the output bundle from &lt;code>ParDo1&lt;/code> must also be thrown away, and all
elements in the input bundle must be retried. These two &lt;code>ParDo&lt;/code>s are co-failing.&lt;/p>
&lt;p>&lt;img src="/images/execution_model_bundling_coupled_failure.svg" alt="Worker two fails to process en element in bundle D, so all elements in both bundle B and bundle D must be retried.">&lt;/p>
&lt;p>&lt;em>Figure 7: Processing of an element within bundle D fails, so all elements in
the input bundle are retried.&lt;/em>&lt;/p>
&lt;p>Note that the retry does not necessarily have the same processing time as the
original attempt, as shown in the diagram.&lt;/p>
&lt;p>All &lt;code>DoFns&lt;/code> that experience coupled failures are terminated and must be torn
down since they aren’t following the normal &lt;code>DoFn&lt;/code> lifecycle .&lt;/p>
&lt;p>Executing transforms this way allows a runner to avoid persisting elements
between transforms, saving on persistence costs.&lt;/p></description></item><item><title>Documentation: File processing patterns</title><link>/documentation/patterns/file-processing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/patterns/file-processing/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="file-processing-patterns">File processing patterns&lt;/h1>
&lt;p>This page describes common file processing tasks. For more information on file-based I/O, see &lt;a href="/documentation/programming-guide/#pipeline-io">Pipeline I/O&lt;/a> and &lt;a href="/documentation/programming-guide/#file-based-data">File-based input and output data&lt;/a>.&lt;/p>
&lt;nav class="language-switcher">
&lt;strong>Adapt for:&lt;/strong>
&lt;ul>
&lt;li data-value="java" class="active">Java SDK&lt;/li>
&lt;li data-value="py">Python SDK&lt;/li>
&lt;/ul>
&lt;/nav>
&lt;h2 id="processing-files-as-they-arrive">Processing files as they arrive&lt;/h2>
&lt;p>This section shows you how to process files as they arrive in your file system or object store (like Google Cloud Storage). You can continuously read files or trigger stream and processing pipelines when a file arrives.&lt;/p>
&lt;h3 id="continuous-read-mode">Continuous read mode&lt;/h3>
&lt;p class="language-java">You can use &lt;code>FileIO&lt;/code> or &lt;code>TextIO&lt;/code> to continuously read the source for new files.&lt;/p>
&lt;p class="language-java">Use the &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html">&lt;code>FileIO&lt;/code>&lt;/a> class to continuously watch a single file pattern. The following example matches a file pattern repeatedly every 30 seconds, continuously returns new matched files as an unbounded &lt;code>PCollection&amp;lt;Metadata&amp;gt;&lt;/code>, and stops if no new files appear for one hour:&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This produces PCollection&amp;lt;MatchResult.Metadata&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">FileIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">match&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">filepattern&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;...&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">continuously&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardSeconds&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">30&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">Watch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Growth&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">afterTimeSinceNewOutput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardHours&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">The &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html">&lt;code>TextIO&lt;/code>&lt;/a> class &lt;code>watchForNewFiles&lt;/code> property streams new file matches.&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// This produces PCollection&amp;lt;String&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TextIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">read&lt;/span>&lt;span class="o">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">from&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;&amp;lt;path-to-files&amp;gt;/*&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">watchForNewFiles&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Check for new files every minute.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardMinutes&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Stop watching the file pattern if no new files appear for an hour.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">Watch&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">Growth&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">afterTimeSinceNewOutput&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Duration&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">standardHours&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1&lt;/span>&lt;span class="o">))));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;p class="language-java">Some runners may retain file lists during updates, but file lists don’t persist when you restart a pipeline. You can save file lists by:&lt;/p>
&lt;p class="language-java">&lt;ul>
&lt;li>Storing processed filenames in an external file and deduplicating the lists at the next transform&lt;/li>
&lt;li>Adding timestamps to filenames, writing a glob pattern to pull in only new files, and matching the pattern when the pipeline restarts&lt;/li>
&lt;/ul>
&lt;/p>
&lt;p class="language-py">The continuous-read option is not available for Python.&lt;/p>
&lt;h3 id="stream-processing-triggered-from-external-source">Stream processing triggered from external source&lt;/h3>
&lt;p>A streaming pipeline can process data from an unbounded source. For example, to trigger stream processing with Google Cloud Pub/Sub:&lt;/p>
&lt;ol>
&lt;li>Use an external process to detect when new files arrive.&lt;/li>
&lt;li>Send a Google Cloud Pub/Sub message with a URI to the file.&lt;/li>
&lt;li>Access the URI from a &lt;code>DoFn&lt;/code> that follows the Google Cloud Pub/Sub source.&lt;/li>
&lt;li>Process the file.&lt;/li>
&lt;/ol>
&lt;h3 id="batch-processing-triggered-from-external-source">Batch processing triggered from external source&lt;/h3>
&lt;p>To start or schedule a batch pipeline job when a file arrives, write the triggering event in the source file itself. This has the most latency because the pipeline must initialize before processing. It’s best suited for low-frequency, large, file-size updates.&lt;/p>
&lt;h2 id="accessing-filenames">Accessing filenames&lt;/h2>
&lt;p class="language-java">Use the &lt;code>FileIO&lt;/code> class to read filenames in a pipeline job. &lt;code>FileIO&lt;/code> returns a &lt;code>PCollection&amp;lt;ReadableFile&amp;gt;&lt;/code> object, and the &lt;code>ReadableFile&lt;/code> instance contains the filename.&lt;/p>
&lt;p class="language-java">To access filenames:&lt;/p>
&lt;p class="language-java">&lt;ol>
&lt;li>Create a &lt;code>ReadableFile&lt;/code> instance with &lt;code>FileIO&lt;/code>. &lt;code>FileIO&lt;/code> returns a &lt;code>PCollection&amp;lt;ReadableFile&amp;gt;&lt;/code> object. The &lt;code>ReadableFile&lt;/code> class contains the filename.&lt;/li>
&lt;li>Call the &lt;code>readFullyAsUTF8String()&lt;/code> method to read the file into memory and return the filename as a &lt;code>String&lt;/code> object. If memory is limited, you can use utility classes like &lt;a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileSystems.html">&lt;code>FileSystems&lt;/code>&lt;/a> to work directly with the file.&lt;/li>
&lt;/ol>
&lt;/p>
&lt;p class="language-py">To read filenames in a pipeline job:&lt;/p>
&lt;p class="language-py">&lt;ol>
&lt;li>Collect the list of file URIs. You can use the &lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html?highlight=filesystems#module-apache_beam.io.filesystems">&lt;code>FileSystems&lt;/code>&lt;/a> module to get a list of files that match a glob pattern.&lt;/li>
&lt;li>Pass the file URIs to a &lt;code>PCollection&lt;/code>.&lt;/li>
&lt;/ol>
&lt;/p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">p&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FileIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">match&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">filepattern&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;hdfs://path/to/*.gz&amp;#34;&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// The withCompression method is optional. By default, the Beam SDK detects compression from
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="c1">// the filename.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">FileIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">readMatches&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">withCompression&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Compression&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">GZIP&lt;/span>&lt;span class="o">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">ParDo&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">new&lt;/span> &lt;span class="n">DoFn&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">FileIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ReadableFile&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@ProcessElement&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">process&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nd">@Element&lt;/span> &lt;span class="n">FileIO&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">ReadableFile&lt;/span> &lt;span class="n">file&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// We can now access the file and its metadata.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">LOG&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">info&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;File Metadata resourceId is {} &amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">file&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">getMetadata&lt;/span>&lt;span class="o">().&lt;/span>&lt;span class="na">resourceId&lt;/span>&lt;span class="o">());&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
&lt;div class='language-py snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-py" data-lang="py">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pipeline&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pipeline&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">readable_files&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pipeline&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">fileio&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">MatchFiles&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;hdfs://path/to/*.txt&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">fileio&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ReadMatches&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Reshuffle&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">files_and_contents&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">readable_files&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">|&lt;/span> &lt;span class="n">beam&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">metadata&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">read_utf8&lt;/span>&lt;span class="p">())))&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div></description></item><item><title>Documentation: Filter</title><link>/documentation/transforms/java/elementwise/filter/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/transforms/java/elementwise/filter/</guid><description>
&lt;!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
&lt;h1 id="filter">Filter&lt;/h1>
&lt;table align="left">
&lt;a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Filter.html">
&lt;img src="/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
&lt;/a>
&lt;/table>
&lt;br>&lt;br>
&lt;p>Given a predicate, filter out all elements that don&amp;rsquo;t satisfy that predicate.
May also be used to filter based on an inequality with a given value based
on the natural ordering of the element.&lt;/p>
&lt;h2 id="examples">Examples&lt;/h2>
&lt;p>&lt;strong>Example 1&lt;/strong>: Filtering with a predicate&lt;/p>
&lt;p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">allStrings&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="s">&amp;#34;Hello&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;world&amp;#34;&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="s">&amp;#34;hi&amp;#34;&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">longStrings&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">allStrings&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">by&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="k">new&lt;/span> &lt;span class="n">SerializableFunction&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">String&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">Boolean&lt;/span>&lt;span class="o">&amp;gt;()&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nd">@Override&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kd">public&lt;/span> &lt;span class="n">Boolean&lt;/span> &lt;span class="nf">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">String&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">)&lt;/span> &lt;span class="o">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">input&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">length&lt;/span>&lt;span class="o">()&lt;/span> &lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">3&lt;/span>&lt;span class="o">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="o">}));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
The result is a &lt;code>PCollection&lt;/code> containing &amp;ldquo;Hello&amp;rdquo; and &amp;ldquo;world&amp;rdquo;.&lt;/p>
&lt;p>&lt;strong>Example 2&lt;/strong>: Filtering with an inequality&lt;/p>
&lt;p>
&lt;div class='language-java snippet'>
&lt;div class="notebook-skip code-snippet">
&lt;a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard">
&lt;img src="/images/copy-icon.svg"/>
&lt;/a>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-java" data-lang="java">&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">numbers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Create&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">of&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">1L&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">2L&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">3L&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">4L&lt;/span>&lt;span class="o">,&lt;/span> &lt;span class="n">5L&lt;/span>&lt;span class="o">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">bigNumbers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numbers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">greaterThan&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">3&lt;/span>&lt;span class="o">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">PCollection&lt;/span>&lt;span class="o">&amp;lt;&lt;/span>&lt;span class="n">Long&lt;/span>&lt;span class="o">&amp;gt;&lt;/span> &lt;span class="n">smallNumbers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">numbers&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">apply&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">Filter&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="na">lessThanEq&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="n">3&lt;/span>&lt;span class="o">));&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;/div>
&lt;/div>
Other variants include &lt;code>Filter.greaterThanEq&lt;/code>, &lt;code>Filter.lessThan&lt;/code> and &lt;code>Filter.equal&lt;/code>.&lt;/p>
&lt;p>&lt;strong>Example 3&lt;/strong>: Filtering with lambda&lt;/p>
&lt;div class="playground-wrapper">
&lt;div class="playground-snippets">
&lt;div
class="language-java playground-snippet"
data-sdk="java"
data-path="SDK_JAVA_Filter"
>&lt;/div>
&lt;/div>
&lt;div
class="code-snippet code-snippet-playground"
data-src="https://play.beam.apache.org/embedded?editable=1&amp;examples=%5b%7b%22path%22%3a%22SDK_JAVA_Filter%22%2c%22sdk%22%3a%22java%22%7d%5d"
data-width="100%"
data-height="700px"
>&lt;/div>
&lt;/div>
&lt;h2 id="related-transforms">Related transforms&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="/documentation/transforms/java/elementwise/flatmapelements">FlatMapElements&lt;/a> behaves the same as &lt;code>Map&lt;/code>, but for
each input it might produce zero or more outputs.&lt;/li>
&lt;li>&lt;a href="/documentation/transforms/java/elementwise/pardo">ParDo&lt;/a> is the most general element-wise mapping
operation, and includes other abilities such as multiple output collections and side-inputs.&lt;/li>
&lt;/ul></description></item></channel></rss>