blob: 74d2bec4eec4d8a0c0c0039911705b3fb54f9511 [file] [log] [blame] [view]
Prompt:
What does this code do?
```python
pipeline_options = PipelineOptions(pipeline_args)
with beam.Pipeline(options=options) as p:
(p | "Read" >> ReadFromPubSub(subscription=input_subscription)
| "Process" >> ParDo(ProcessMessage())
| "Write" >> WriteToPubSub(topic=output_topic))
```
Response:
This code reads messages from the [Pub/Sub](https://cloud.google.com/pubsub/docs/overview) subscription `input_subscription` using the `ReadFromPubSub` function and processes them. The code then writes the resulting collection to the Pub/Sub topic `output_topic` using the `WriteToPubSub` transform.
Pub/Sub is only supported in streaming pipelines.
Reading messages directly from a topic is also supported. In this case, a temporary subscription is created automatically.
`ReadFromPubSub` produces a `PCollection` of `PubsubMessage` objects or a `PCollection` of byte sequences. The behavior is controlled by the `with_attributes` parameter, with byte sequences being the default. For more information about the `PCollection` object, see the [Beam Programming Guide](https://beam.apache.org/documentation/basics/#pcollection).
The `ProcessMessage` class processes the messages. This class is a subclass of the `DoFn` class.
Its implementation might look like the following example:
```python
class ProcessMessage(beam.DoFn):
def process(self, element):
yield element
```
For more information about the `DoFn` class, see the [Beam Programming Guide](https://beam.apache.org/documentation/programming-guide/#dofn).
For more details, see the [Pub/Sub I/O transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html).
For a common pattern for configuring pipeline options, see the [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/) section.