Prompt: What is a state in Apache Beam?
Response: State in Apache Beam is an API that allows for managing per-key state in element-wise operations like ParDo or Map. It enables fine-grained control over aggregations, particularly in situations where windowing and triggers are insufficient.
State is scoped to the current processing key within a window, meaning each key has its own state. State is particularly useful for complex aggregations across multiple elements.
Apache Beam provides several types of state that can be used for different use cases, including scalar and collection values:
ValueStateBagStateSetStateMapStateOrderedListStateCombiningStateThe per-key state needs to be garbage-collected because the increasing size of the state can negatively impact performance. For more information about common garbage collection strategies, see Garbage collecting state.
For more information, read the community blog post about Stateful Processing.
For additional examples of using state in Apache Beam, see State and Timers examples.