Join-library provides inner join, outer left and right join functions to Apache Beam. The aim is to simplify the most common cases of join to a simple function call.
The functions are generic so it supports join of any types supported by Beam. Input to the join functions are PCollections of Key/Values. Both the left and right PCollections need the same type for the key. All the join functions return a Key/Value where Key is the join key and value is a Key/Value where the key is the left value and right is the value.
In the cases of outer join, since null cannot be serialized the user have to provide a value that represent null for that particular use case.
Example how to use join-library:
PCollection<KV<String, String>> leftPcollection = ... PCollection<KV<String, Long>> rightPcollection = ... PCollection<KV<String, KV<String, Long>>> joinedPcollection = Join.innerJoin(leftPcollection, rightPcollection);
Join-library can be found on maven-central:
<dependency> <groupId>org.apache.beam</groupId> <artifactId>join-library</artifactId> <version>0.1-incubating-SNAPSHOT</version> </dependency>
Questions or comments: M.Runesson [at] gmail [dot] com