layout: section title: “Beam SQL: Joins” section_menu: section-menu/sdks.html permalink: /documentation/dsls/sql/joins/

Beam SQL: Joins

Supported JOIN types in Beam SQL:

  • INNER, LEFT OUTER, RIGHT OUTER
  • Only equijoins (where join condition is an equality check) are supported

Unsupported JOIN types in Beam SQL:

  • CROSS JOIN is not supported (full cartesian product with no ON clause)
  • FULL OUTER JOIN is not supported (combination of LEFT OUTER and RIGHT OUTER joins)

The scenarios of join can be categorized into 3 cases:

  1. Bounded input JOIN bounded input
  2. Unbounded input JOIN unbounded input
  3. Unbounded input JOIN bounded input

Bounded JOIN Bounded

Standard join implementation is used. All elements from one input are matched with all elements from another input. Due to the fact that both inputs are bounded, no windowing or triggering is involved.

Unbounded JOIN Unbounded

Standard join implementation is used. All elements from one input are matched with all elements from another input.

Windowing and Triggering

The following properties must be satisfied when joining unbounded inputs:

  • Inputs must have compatible windows, otherwise IllegalArgumentException will be thrown.
  • Triggers on each input should only fire once per window. Currently this means that the only supported trigger in this case is DefaultTrigger with zero allowed lateness. Using any other trigger will result in UnsupportedOperationException thrown.

This means that inputs are joined per-window. That is, when the trigger fires (only once), then join is performed on all elements in the current window in both inputs. This allows to reason about what kind of output is going to be produced.

Note: similarly to GroupByKeys JOIN will update triggers using Trigger.continuationTrigger(). Other aspects of the inputs' windowing strategies remain unchanged.

Unbounded JOIN Bounded

For this type of JOIN bounded input is treated as a side-input by the implementation.

This means that

  • window/trigger is inherented from upstreams, which should be consistent