blob: c2a30cc56f423a4cdec1fdd361679bd51dabe646 [file] [log] [blame] [view]
---
title: "Quickstart: Setup"
---
* This will be replaced by the TOC
{:toc}
Get Flink up and running in a few simple steps.
## Requirements
Flink runs on all __UNIX-like__ environments: __Linux__, __Mac OS X__, __Cygwin__. The only requirement is to have a working __Java 6.x__ (or higher) installation.
## Download
Download the ready to run binary package. Choose the Flink distribution that __matches your Hadoop version__. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.
<ul class="nav nav-tabs">
<li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li>
<li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li>
</ul>
<p>
<div class="tab-content text-center">
<div class="tab-pane active" id="bin-hadoop1">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP_1_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 1.2</a>
</div>
<div class="tab-pane" id="bin-hadoop2">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP_2_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 2</a>
</div>
</div>
</p>
## Start
You are almost done.
1. Go to the download directory.
2. Unpack the downloaded archive.
3. Start Flink.
~~~bash
$ cd ~/Downloads # Go to download directory
$ tar xzf flink-*.tgz # Unpack the downloaded archive
$ cd flink
$ bin/start-local.sh # Start Flink
~~~
Check the __JobManager's web frontend__ at [http://localhost:8081](http://localhost:8081) and make
sure everything is up and running.
## Run Example
Run the __Word Count example__ to see Flink at work.
* __Download test data__:
~~~bash
$ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt
~~~
* You now have a text file called _hamlet.txt_ in your working directory.
* __Start the example program__:
~~~bash
$ bin/flink run \
--jarfile ./examples/flink-java-examples-{{site.FLINK_VERSION_STABLE}}-WordCount.jar \
--arguments file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt
~~~
* You will find a file called __wordcount-result.txt__ in your current directory.
## Cluster Setup
__Running Flink on a cluster__ is as easy as running it locally. Having __passwordless SSH__ and
__the same directory structure__ on all your cluster nodes lets you use our scripts to control
everything.
1. Copy the unpacked __flink__ directory from the downloaded archive to the same file system path
on each node of your setup.
2. Choose a __master node__ (JobManager) and set the `jobmanager.rpc.address` key in
`conf/flink-conf.yaml` to its IP or hostname. Make sure that all nodes in your cluster have the same
`jobmanager.rpc.address` configured.
3. Add the IPs or hostnames (one per line) of all __worker nodes__ (TaskManager) to the slaves files
in `conf/slaves`.
You can now __start the cluster__ at your master node with `bin/start-cluster.sh`.
The following __example__ illustrates the setup with three nodes (with IP addresses from _10.0.0.1_
to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the
configuration files, which need to be accessible at the same path on all machines:
<div class="row">
<div class="col-md-6 text-center">
<img src="img/quickstart_cluster.png" style="width: 85%">
</div>
<div class="col-md-6">
<div class="row">
<p class="lead text-center">
/path/to/<strong>flink/conf/<br>flink-conf.yaml</strong>
<pre>jobmanager.rpc.address: 10.0.0.1</pre>
</p>
</div>
<div class="row" style="margin-top: 1em;">
<p class="lead text-center">
/path/to/<strong>flink/<br>conf/slaves</strong>
<pre>
10.0.0.2
10.0.0.3</pre>
</p>
</div>
</div>
</div>
## Flink on YARN
You can easily deploy Flink on your existing __YARN cluster__.
1. Download the __Flink YARN package__ with the YARN client: [Flink for YARN]({{site.FLINK_DOWNLOAD_URL_YARN_STABLE}})
2. Make sure your __HADOOP_HOME__ (or _YARN_CONF_DIR_ or _HADOOP_CONF_DIR_) __environment variable__ is set to read your YARN and HDFS configuration.
3. Run the __YARN client__ with: `./bin/yarn-session.sh`. You can run the client with options `-n 10 -tm 8192` to allocate 10 TaskManagers with 8GB of memory each.
For __more detailed instructions__, check out the programming Guides and examples.