website/src/documentation/runners/jet.md

layout: section title: “Hazelcast Jet Runner” section_menu: section-menu/runners.html permalink: /documentation/runners/jet/ redirect_from: /learn/runners/jet/

Overview

The Hazelcast Jet Runner can be used to execute Beam pipelines using Hazelcast Jet.

The Jet Runner and Jet are suitable for large scale continuous jobs and provide:

Support for both batch (bounded) and streaming (unbounded) data sets
A runtime that supports very high throughput and low event latency at the same time
Natural back-pressure in streaming programs
Distributed massively parallel data processing engine with in memory storage

It's important to note that the Jet Runner is currently in an EXPERIMENTAL state and can not make use of many of the capabilities present in Jet:

Jet has full Fault Tolerance support, the Jet Runner does not; if a job fails it must be restarted
Internal performance of Jet is extremely high. The Runner can't match it as of now because Beam pipeline optimization/surgery has not been fully implemented.

The [Beam Capability Matrix]({{ site.baseurl }}/documentation/runners/capability-matrix/) documents the supported capabilities of the Jet Runner.

Running WordCount with the Hazelcast Jet Runner

Generating the Beam examples project

Just follow the instruction from the [Java Quickstart page]({{ site.baseurl }}/get-started/quickstart-java/#get-the-wordcount-code)

Running WordCount on a Local Jet Cluster

Issue following command in the Beam examples project to start new Jet cluster and run the WordCount example on it.

    $ mvn package exec:java \
        -DskipTests \
        -Dexec.mainClass=org.apache.beam.examples.WordCount \
        -Dexec.args="\
            --runner=JetRunner \
            --jetLocalMode=3 \
            --inputFile=pom.xml \
            --output=counts" \
        -Pjet-runner

Running WordCount on a Remote Jet Cluster

Download latest stable Hazelcast Jet code from Hazelcast Website and start Jet cluster. The simplest way is to start Jet cluster member using the jet-start script that comes with Jet distribution. The members use the auto discovery feature to form a cluster. Let's start up a cluster formed by two members:

    $ cd hazelcast-jet
    $ bin/jet-start.sh &
    $ bin/jet-start.sh &

Check the cluster is up and running:

    $ ./jet.sh cluster

You should see something like:

State: ACTIVE
Version: 3.0
Size: 2

ADDRESS                  UUID               
[192.168.0.117]:5701     76bea7ba-f032-4c25-ad04-bdef6782f481
[192.168.0.117]:5702     03ecfaa2-be16-41b6-b5cf-eea584d7fb86

Download Jet Management Center from the same location and use it to monitor your cluster and later executions.

Change directory to the Beam Examples project and issue following command to submit and execute your Pipeline on the remote Jet cluster. Make sure to distribute the input file (file with the words to be counted) to all machines where the cluster runs. The word count job won't be able to read the data otherwise.