The old Flink Runner will eventually be replaced by the Portable Runner which enables to run pipelines in other languages than Java. Please see the [Portability page]({{ site.baseurl }}/contribute/portability/) for the latest state.
The Apache Flink Runner can be used to execute Beam pipelines using Apache Flink. When using the Flink Runner you will create a jar file containing your job that can be executed on a regular Flink cluster. It‘s also possible to execute a Beam pipeline using Flink’s local execution mode without setting up a cluster. This is helpful for development and debugging of your pipeline.
The Flink Runner and Flink are suitable for large scale, continuous jobs, and provide:
The [Beam Capability Matrix]({{ site.baseurl }}/documentation/runners/capability-matrix/) documents the supported capabilities of the Flink Runner.
If you want to use the local execution mode with the Flink runner to don't have to complete any setup.
To use the Flink Runner for executing on a cluster, you have to setup a Flink cluster by following the Flink setup quickstart.
The Flink cluster version has to match the version used by the FlinkRunner. To find out which version of Flink please see the table below:
For retrieving the right version, see the Flink downloads page.
For more information, the Flink Documentation can be helpful.
When using Java, you must specify your dependency on the Flink Runner in your pom.xml
.
<dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-runners-flink_2.11</artifactId> <version>{{ site.release_latest }}</version> </dependency>
This section is not applicable to the Beam SDK for Python.
For executing a pipeline on a Flink cluster you need to package your program along will all dependencies in a so-called fat jar. How you do this depends on your build system but if you follow along the [Beam Quickstart]({{ site.baseurl }}/get-started/quickstart/) this is the command that you have to run:
$ mvn package -Pflink-runner
The Beam Quickstart Maven project is setup to use the Maven Shade plugin to create a fat jar and the -Pflink-runner
argument makes sure to include the dependency on the Flink Runner.
For actually running the pipeline you would use this command
$ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Pflink-runner \ -Dexec.args="--runner=FlinkRunner \ --inputFile=/path/to/pom.xml \ --output=/path/to/counts \ --flinkMaster=<flink master url> \ --filesToStage=target/word-count-beam--bundled-0.1.jar"
If you have a Flink JobManager
running on your local machine you can give localhost:8081
for flinkMaster
.
When executing your pipeline with the Flink Runner, you can set these pipeline options.
See the reference documentation for the [FlinkPipelineOptions]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html)PipelineOptions interface (and its subinterfaces) for the complete list of pipeline configuration options.
You can monitor a running Flink job using the Flink JobManager Dashboard. By default, this is available at port 8081
of the JobManager node. If you have a Flink installation on your local machine that would be http://localhost:8081
.
If your pipeline uses an unbounded data source or sink, the Flink Runner will automatically switch to streaming mode. You can enforce streaming mode by using the streaming
setting mentioned above.