The Hazelcast Jet Runner can be used to execute Beam pipelines using Hazelcast Jet.
The Jet Runner and Jet are suitable for large scale continuous jobs and provide:
It's important to note that the Jet Runner is currently in an EXPERIMENTAL state and can not make use of many of the capabilities present in Jet:
The [Beam Capability Matrix]({{ site.baseurl }}/documentation/runners/capability-matrix/) documents the supported capabilities of the Jet Runner.
Just follow the instruction from the [Java Quickstart page]({{ site.baseurl }}/get-started/quickstart-java/#get-the-wordcount-code)
Issue following command in the Beam examples project to start new Jet cluster and run the WordCount example on it.
$ mvn package exec:java \ -DskipTests \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="\ --runner=JetRunner \ --jetLocalMode=3 \ --inputFile=pom.xml \ --output=counts" \ -Pjet-runner
Download latest stable Hazelcast Jet code from Hazelcast Website and start Jet cluster. The simplest way is to start Jet cluster member using the jet-start
script that comes with Jet distribution. The members use the auto discovery feature to form a cluster. Let's start up a cluster formed by two members:
$ cd hazelcast-jet $ bin/jet-start.sh & $ bin/jet-start.sh &
Check the cluster is up and running:
$ ./jet.sh cluster
You should see something like:
State: ACTIVE Version: 3.0 Size: 2 ADDRESS UUID [192.168.0.117]:5701 76bea7ba-f032-4c25-ad04-bdef6782f481 [192.168.0.117]:5702 03ecfaa2-be16-41b6-b5cf-eea584d7fb86
Download Jet Management Center from the same location and use it to monitor your cluster and later executions.
Change directory to the Beam Examples project and issue following command to submit and execute your Pipeline on the remote Jet cluster. Make sure to distribute the input file (file with the words to be counted) to all machines where the cluster runs. The word count job won't be able to read the data otherwise.
$ mvn package exec:java \ -DskipTests \ -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="\ --runner=JetRunner \ --jetServers=192.168.0.117:5701,192.168.0.117:5702 \ --codeJarPathname=target/word-count-beam-bundled-0.1.jar \ --inputFile=<INPUT_FILE_AVAILABLE_ON_ALL_CLUSTER_MEMBERS> \ --output=/tmp/counts" \ -Pjet-runner