| --- |
| title: "Quickstart: Setup" |
| --- |
| |
| * This will be replaced by the TOC |
| {:toc} |
| |
| Get Flink up and running in a few simple steps. |
| |
| ## Requirements |
| Flink runs on all __UNIX-like__ environments: __Linux__, __Mac OS X__, __Cygwin__. The only requirement is to have a working __Java 6.x__ (or higher) installation. |
| |
| ## Download |
| Download the ready to run binary package. Choose the Flink distribution that __matches your Hadoop version__. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2. |
| |
| <ul class="nav nav-tabs"> |
| <li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li> |
| <li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li> |
| </ul> |
| <p> |
| <div class="tab-content text-center"> |
| <div class="tab-pane active" id="bin-hadoop1"> |
| <a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP_1_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 1.2</a> |
| </div> |
| <div class="tab-pane" id="bin-hadoop2"> |
| <a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP_2_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 2</a> |
| </div> |
| </div> |
| </p> |
| |
| |
| ## Start |
| You are almost done. |
| |
| 1. Go to the download directory. |
| 2. Unpack the downloaded archive. |
| 3. Start Flink. |
| |
| |
| ~~~bash |
| $ cd ~/Downloads # Go to download directory |
| $ tar xzf flink-*.tgz # Unpack the downloaded archive |
| $ cd flink |
| $ bin/start-local.sh # Start Flink |
| ~~~ |
| |
| Check the __JobManager's web frontend__ at [http://localhost:8081](http://localhost:8081) and make |
| sure everything is up and running. |
| |
| ## Run Example |
| |
| Run the __Word Count example__ to see Flink at work. |
| |
| * __Download test data__: |
| |
| ~~~bash |
| $ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt |
| ~~~ |
| |
| * You now have a text file called _hamlet.txt_ in your working directory. |
| * __Start the example program__: |
| |
| ~~~bash |
| $ bin/flink run \ |
| --jarfile ./examples/flink-java-examples-{{site.FLINK_VERSION_STABLE}}-WordCount.jar \ |
| |
| --arguments file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt |
| ~~~ |
| |
| * You will find a file called __wordcount-result.txt__ in your current directory. |
| |
| |
| ## Cluster Setup |
| |
| __Running Flink on a cluster__ is as easy as running it locally. Having __passwordless SSH__ and |
| __the same directory structure__ on all your cluster nodes lets you use our scripts to control |
| everything. |
| |
| 1. Copy the unpacked __flink__ directory from the downloaded archive to the same file system path |
| on each node of your setup. |
| 2. Choose a __master node__ (JobManager) and set the `jobmanager.rpc.address` key in |
| `conf/flink-conf.yaml` to its IP or hostname. Make sure that all nodes in your cluster have the same |
| `jobmanager.rpc.address` configured. |
| 3. Add the IPs or hostnames (one per line) of all __worker nodes__ (TaskManager) to the slaves files |
| in `conf/slaves`. |
| |
| You can now __start the cluster__ at your master node with `bin/start-cluster.sh`. |
| |
| |
| The following __example__ illustrates the setup with three nodes (with IP addresses from _10.0.0.1_ |
| to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the |
| configuration files, which need to be accessible at the same path on all machines: |
| |
| <div class="row"> |
| <div class="col-md-6 text-center"> |
| <img src="img/quickstart_cluster.png" style="width: 85%"> |
| </div> |
| <div class="col-md-6"> |
| <div class="row"> |
| <p class="lead text-center"> |
| /path/to/<strong>flink/conf/<br>flink-conf.yaml</strong> |
| <pre>jobmanager.rpc.address: 10.0.0.1</pre> |
| </p> |
| </div> |
| <div class="row" style="margin-top: 1em;"> |
| <p class="lead text-center"> |
| /path/to/<strong>flink/<br>conf/slaves</strong> |
| <pre> |
| 10.0.0.2 |
| 10.0.0.3</pre> |
| </p> |
| </div> |
| </div> |
| </div> |
| |
| ## Flink on YARN |
| You can easily deploy Flink on your existing __YARN cluster__. |
| |
| 1. Download the __Flink YARN package__ with the YARN client: [Flink for YARN]({{site.FLINK_DOWNLOAD_URL_YARN_STABLE}}) |
| 2. Make sure your __HADOOP_HOME__ (or _YARN_CONF_DIR_ or _HADOOP_CONF_DIR_) __environment variable__ is set to read your YARN and HDFS configuration. |
| 3. Run the __YARN client__ with: `./bin/yarn-session.sh`. You can run the client with options `-n 10 -tm 8192` to allocate 10 TaskManagers with 8GB of memory each. |
| |
| For __more detailed instructions__, check out the programming Guides and examples. |