The samples on this page describe common patterns using Schemas. Schemas provide us a type-system for Beam records that is independent of any specific programming-language type. There might be multiple Java classes that all have the same schema (for example a Protocol-Buffer class or a POJO class), and Beam will allow us to seamlessly convert between these types. Schemas also provide a simple way to reason about types across different programming-language APIs. For more information, see the programming guide section on Schemas.
{{< language-switcher java >}}
Beam supports equijoins on schema PCollections of Schemas where the join condition depends on the equality of a subset of fields.
Consider using Join if you have multiple collections that provide information about related things, and their structure is known.
For example let's say we have two different collections with user data: one collection contains names and email addresses; the other collection contains names and phone numbers. We can join the two collections using the name as a common key and the other data as the associated values. After the join, we have one collection that contains all the information (email address and phone numbers) associated with each name.
The following conceptual example uses two input collections to show the mechanism of Join.
First, we define Schemas and User data.
{{< highlight java >}} {{< code_sample “examples/java/src/test/java/org/apache/beam/examples/snippets/SnippetsTest.java” SchemaJoinPatternCreate >}} {{< /highlight >}}
Then we create the Pcollections for user data and perform join on the two PCollections using a Join.
{{< highlight java >}} {{< code_sample “examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java” SchemaJoinPatternJoin >}} {{< /highlight >}}
The result Row is of the type Row: [Row(emailSchema), Row(phoneSchema)], and it can be converted to desired format as shown in the code snippet below.
{{< highlight java >}} {{< code_sample “examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java” SchemaJoinPatternFormat >}} {{< /highlight >}}