| --- |
| title: "Quickstart: Setup" |
| # Top navigation |
| top-nav-group: quickstart |
| top-nav-pos: 1 |
| top-nav-title: Setup |
| --- |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| * This will be replaced by the TOC |
| {:toc} |
| |
| Get Flink up and running in a few simple steps. |
| |
| ## Requirements |
| |
| Flink runs on __Linux, Mac OS X, and Windows__. To be able to run Flink, the |
| only requirement is to have a working __Java 7.x__ (or higher) |
| installation. Windows users, please take a look at the |
| [Flink on Windows]({{ site.baseurl }}/setup/local_setup.html#flink-on-windows) guide which describes |
| how to run Flink on Windows for local setups. |
| |
| ## Download |
| Download the ready to run binary package. Choose the Flink distribution that __matches your Hadoop version__. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2. |
| |
| <ul class="nav nav-tabs"> |
| <li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li> |
| <li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li> |
| </ul> |
| <p> |
| <div class="tab-content text-center"> |
| <div class="tab-pane active" id="bin-hadoop1"> |
| <a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP1_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 1.2</a> |
| </div> |
| <div class="tab-pane" id="bin-hadoop2"> |
| <a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="{{site.FLINK_DOWNLOAD_URL_HADOOP2_STABLE}}"><i class="icon-download"> </i> Download Flink for Hadoop 2</a> |
| </div> |
| </div> |
| </p> |
| |
| |
| ## Start |
| |
| 1. Go to the download directory. |
| 2. Unpack the downloaded archive. |
| 3. Start Flink. |
| |
| |
| ~~~bash |
| $ cd ~/Downloads # Go to download directory |
| $ tar xzf flink-*.tgz # Unpack the downloaded archive |
| $ cd flink-{{site.version}} |
| $ bin/start-local.sh # Start Flink |
| ~~~ |
| |
| Check the __JobManager's web frontend__ at [http://localhost:8081](http://localhost:8081) and make |
| sure everything is up and running. |
| |
| ## Run Example |
| |
| Run the __Word Count example__ to see Flink at work. |
| |
| * __Download test data__: |
| |
| ~~~bash |
| $ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt |
| ~~~ |
| |
| * You now have a text file called _hamlet.txt_ in your working directory. |
| * __Start the example program__: |
| |
| ~~~bash |
| $ bin/flink run ./examples/batch/WordCount.jar --input file://`pwd`/hamlet.txt --output file://`pwd`/wordcount-result.txt |
| ~~~ |
| |
| * You will find a file called __wordcount-result.txt__ in your current directory. |
| |
| ## Stop |
| |
| To stop Flink when you're done, you just have to type: |
| |
| ~~~bash |
| $ bin/stop-local.sh |
| ~~~ |
| |
| ## Cluster Setup |
| |
| __Running Flink on a cluster__ is as easy as running it locally. Having __passwordless SSH__ and |
| __the same directory structure__ on all your cluster nodes lets you use our scripts to control |
| everything. |
| |
| 1. Copy the unpacked __flink__ directory from the downloaded archive to the same file system path |
| on each node of your setup. |
| 2. Choose a __master node__ (JobManager) and set the `jobmanager.rpc.address` key in |
| `conf/flink-conf.yaml` to its IP or hostname. Make sure that all nodes in your cluster have the same |
| `jobmanager.rpc.address` configured. |
| 3. Add the IPs or hostnames (one per line) of all __worker nodes__ (TaskManager) to the slaves files |
| in `conf/slaves`. |
| |
| You can now __start the cluster__ at your master node with `bin/start-cluster.sh`. |
| |
| The following __example__ illustrates the setup with three nodes (with IP addresses from _10.0.0.1_ |
| to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the |
| configuration files, which need to be accessible at the same path on all machines: |
| |
| <div class="row"> |
| <div class="col-md-6 text-center"> |
| <img src="{{ site.baseurl }}/page/img/quickstart_cluster.png" style="width: 85%"> |
| </div> |
| <div class="col-md-6"> |
| <div class="row"> |
| <p class="lead text-center"> |
| /path/to/<strong>flink/conf/<br>flink-conf.yaml</strong> |
| <pre>jobmanager.rpc.address: 10.0.0.1</pre> |
| </p> |
| </div> |
| <div class="row" style="margin-top: 1em;"> |
| <p class="lead text-center"> |
| /path/to/<strong>flink/<br>conf/slaves</strong> |
| <pre> |
| 10.0.0.2 |
| 10.0.0.3</pre> |
| </p> |
| </div> |
| </div> |
| </div> |
| |
| Have a look at the [Configuration]({{ site.baseurl }}/setup/config.html) section of the documentation to see other available configuration options. |
| For Flink to run efficiently, a few configuration values need to be set. |
| |
| In particular, |
| |
| * the amount of available memory per TaskManager (`taskmanager.heap.mb`), |
| * the number of available CPUs per machine (`taskmanager.numberOfTaskSlots`), |
| * the total number of CPUs in the cluster (`parallelism.default`) and |
| * the temporary directories (`taskmanager.tmp.dirs`) |
| |
| |
| are very important configuration values. |
| |
| ## Flink on YARN |
| You can easily deploy Flink on your existing __YARN cluster__. |
| |
| 1. Download the __Flink Hadoop2 package__: [Flink with Hadoop 2]({{site.FLINK_DOWNLOAD_URL_HADOOP2_STABLE}}) |
| 2. Make sure your __HADOOP_HOME__ (or _YARN_CONF_DIR_ or _HADOOP_CONF_DIR_) __environment variable__ is set to read your YARN and HDFS configuration. |
| 3. Run the __YARN client__ with: `./bin/yarn-session.sh`. You can run the client with options `-n 10 -tm 8192` to allocate 10 TaskManagers with 8GB of memory each. |
| |
| For __more detailed instructions__, check out the programming Guides and examples. |