blob: 587effafa064601052538f507da21ff00029be16 [file] [log] [blame] [view]
Prompt:
What is a trigger in Apache Beam?
Response:
Apache Beam uses triggers to determine when to emit the aggregated results of each [window](https://beam.apache.org/documentation/programming-guide/#windowing), referred to as a pane.
Triggers provide two additional capabilities compared to [outputting at the end of a window](https://beam.apache.org/documentation/programming-guide/#default-trigger):
- Allowing early results to be output before the end of the window.
- Allowing late data to be handled after the end of the window.
This feature lets you control the flow of your data and balance between completeness, latency, and cost.
You set the triggers for a `PCollection` by setting the `trigger` parameter of the `WindowInto` transform.
```python
pcollection | WindowInto(
FixedWindows(1 * 60),
trigger=AfterProcessingTime(1 * 60),
accumulation_mode=AccumulationMode.DISCARDING)
```
When a trigger fires, it emits the current contents of the window as a pane. Because a trigger can fire multiple times, the accumulation mode determines whether the system accumulates the window panes as the trigger fires, or discards them. This behavior is controlled by the [window accumulation mode](https://beam.apache.org/documentation/programming-guide/#window-accumulation-modes) parameter of the `WindowInto` transform.
Beam provides several [built-in triggers](https://beam.apache.org/documentation/programming-guide/#triggers) that you can use to determine when to emit the results of your pipeline's windowed computations:
- Event time triggers
- Processing time triggers
- Data-driven triggers
- Composite triggers
One of the most useful trigger patterns is the [`AfterWatermark`](https://beam.apache.org/documentation/programming-guide/#composite-afterwatermark) trigger, which fires a single time when Apache Beam estimates that all the data has arrived, such as when the watermark passes the end of the window.