中文版

Lab Discussion: KeyedProcessFunction and Timers (Long Ride Alerts)

(Discussion of Lab: KeyedProcessFunction and Timers (Long Ride Alerts))

Analysis

These cases are worth noting:

  • The START event is missing. Then END event will sit in state indefinitely (this is a leak!).
  • The END event is missing. The timer will fire and the state will be cleared (this is ok).
  • The END event arrives after the timer has fired and cleared the state. In this case the END event will be stored in state indefinitely (this is another leak!).

These leaks could be addressed by either using state TTL, or another timer, to eventually clear any lingering state.

Bottom line

Regardless of how clever we are with what state we keep, and how long we choose to keep it, we should eventually clear it -- because otherwise our state will grow in an unbounded fashion. And having lost that information, we will run the risk of late events causing incorrect or duplicated results.

This tradeoff between keeping state indefinitely versus occasionally getting things wrong when events are late is a challenge that is inherent to stateful stream processing.

If you want to go further

For each of these, add tests to check for the desired behavior.

  • Extend the solution so that it never leaks state.
  • Define what it means for an event to be missing, detect missing START and END events, and send some notification of this to a side output.

Back to Labs Overview