Prompt: What does this code do?
pipeline_options = PipelineOptions(pipeline_args) with beam.Pipeline(options=options) as p: (p | "Read" >> ReadFromPubSub(subscription=input_subscription) | "Process" >> ParDo(ProcessMessage()) | "Write" >> WriteToPubSub(topic=output_topic))
Response: This code reads messages from the Pub/Sub subscription input_subscription using the ReadFromPubSub function and processes them. The code then writes the resulting collection to the Pub/Sub topic output_topic using the WriteToPubSub transform.
Pub/Sub is only supported in streaming pipelines.
Reading messages directly from a topic is also supported. In this case, a temporary subscription is created automatically.
ReadFromPubSub produces a PCollection of PubsubMessage objects or a PCollection of byte sequences. The behavior is controlled by the with_attributes parameter, with byte sequences being the default. For more information about the PCollection object, see the Beam Programming Guide.
The ProcessMessage class processes the messages. This class is a subclass of the DoFn class. Its implementation might look like the following example:
class ProcessMessage(beam.DoFn): def process(self, element): yield element
For more information about the DoFn class, see the Beam Programming Guide.
For more details, see the Pub/Sub I/O transform documentation.
For a common pattern for configuring pipeline options, see the Pipeline option patterns section.