blob: ec68a24bb7b362dd9723e2c00e08455a663f6f20 [file] [log] [blame] [view]
---
layout: section
title: "Beam SQL: Joins"
section_menu: section-menu/sdks.html
permalink: /documentation/dsls/sql/joins/
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Beam SQL: Joins
Supported `JOIN` types in Beam SQL:
* `INNER`, `LEFT OUTER`, `RIGHT OUTER`
* Only equijoins (where join condition is an equality check) are supported
Unsupported `JOIN` types in Beam SQL:
* `CROSS JOIN` is not supported (full cartesian product with no `ON` clause)
* `FULL OUTER JOIN` is not supported (combination of `LEFT OUTER` and `RIGHT OUTER` joins)
The scenarios of join can be categorized into 3 cases:
1. Bounded input `JOIN` bounded input
2. Unbounded input `JOIN` unbounded input
3. Unbounded input `JOIN` bounded input
## Bounded JOIN Bounded {#join-bounded-bounded}
Standard join implementation is used. All elements from one input are matched
with all elements from another input. Due to the fact that both inputs are
bounded, no windowing or triggering is involved.
## Unbounded JOIN Unbounded {#join-unbounded-unbounded}
Standard join implementation is used. All elements from one input are matched
with all elements from another input.
**Windowing and Triggering**
The following properties must be satisfied when joining unbounded inputs:
- Inputs must have compatible windows, otherwise `IllegalArgumentException`
will be thrown.
- Triggers on each input should only fire once per window. Currently this
means that the only supported trigger in this case is `DefaultTrigger` with
zero allowed lateness. Using any other trigger will result in
`UnsupportedOperationException` thrown.
This means that inputs are joined per-window. That is, when the trigger fires
(only once), then join is performed on all elements in the current window in
both inputs. This allows to reason about what kind of output is going to be
produced.
**Note:** similarly to `GroupByKeys` `JOIN` will update triggers using
`Trigger.continuationTrigger()`. Other aspects of the inputs' windowing
strategies remain unchanged.
## Unbounded JOIN Bounded {#join-unbounded-bounded}
For this type of `JOIN` bounded input is treated as a side-input by the
implementation.
This means that
- window/trigger is inherented from upstreams, which should be consistent