tree: 6025d2b1e8f53cec4b0a40a2d97c01784966bd98 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
runners/apex/README.md

Apex Beam Runner ﴾Apex‐Runner﴿

Apex‐Runner is a Runner for Apache Beam which executes Beam pipelines with Apache Apex as underlying engine. The runner has broad support for the Beam model and supports streaming and batch pipelines.

Apache Apex is a stream processing platform and framework for low-latency, high-throughput and fault-tolerant analytics applications on Apache Hadoop. Apex is Java based and also provides its own API for application development (native compositional and declarative Java API, SQL) with a comprehensive operator library. Apex has a unified streaming architecture and can be used for real-time and batch processing. With its stateful stream processing architecture Apex can support all of the concepts in the Beam model (event time, triggers, watermarks etc.).

##Status

Apex-Runner is relatively new. It is fully functional and can currently be used to run pipelines in embedded mode. It does not take advantage of all the performance and scalability that Apex can deliver. This is expected to be addressed with upcoming work, leveraging features like incremental checkpointing, partitioning and operator affinity from Apex. Please see JIRA and we welcome contributions!

##Getting Started

The following shows how to run the WordCount example that is provided with the source code on Apex (the example is identical with the one provided as part of the Beam examples).

###Installing Beam

To get the latest version of Beam with Apex-Runner, first clone the Beam repository:

git clone https://github.com/apache/incubator‐beam

Then switch to the newly created directory and run Maven to build the Apache Beam:

cd incubator‐beam
mvn clean install ‐DskipTests

Now Apache Beam and the Apex Runner are installed in your local Maven repository.

###Running an Example

Download something to count:

curl http://www.gutenberg.org/cache/epub/1128/pg1128.txt > /tmp/kinglear.txt

Run the pipeline, using the Apex runner:

cd examples/java
mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/kinglear.txt --output=/tmp/wordcounts.txt --runner=ApexRunner" -Papex-runner

Once completed, there will be multiple output files with the base name given above:

$ ls /tmp/out-*
/tmp/out-00000-of-00003  /tmp/out-00001-of-00003  /tmp/out-00002-of-00003

##Running pipelines on an Apex YARN cluster

Coming soon.