| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE html> |
| |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| |
| <title>Apache Flink 0.9.0 Documentation: Cluster Setup</title> |
| |
| <link rel="shortcut icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon"> |
| <link rel="icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon"> |
| |
| <!-- Bootstrap --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css"> |
| <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/flink.css"> |
| <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/syntax.css"> |
| <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/codetabs.css"> |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| |
| |
| |
| |
| |
| |
| <!-- Top navbar. --> |
| <nav class="navbar navbar-default navbar-fixed-top"> |
| <div class="container"> |
| <!-- The logo. --> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <div class="navbar-logo"> |
| <a href="http://flink.apache.org"><img alt="Apache Flink" src="http://flink.apache.org/docs/0.9/page/img/navbar-brand-logo.jpg"></a> |
| </div> |
| </div><!-- /.navbar-header --> |
| |
| <!-- The navigation links. --> |
| <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1"> |
| <ul class="nav navbar-nav"> |
| <li><a href="http://flink.apache.org/docs/0.9/index.html">Overview<span class="hidden-sm hidden-xs"> 0.9.0</span></a></li> |
| |
| <!-- Setup --> |
| <li class="dropdown"> |
| <a href="http://flink.apache.org/docs/0.9/setup" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Setup <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu"> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/building.html">Get Flink 0.9-SNAPSHOT</a></li> |
| |
| <li class="divider"></li> |
| <li role="presentation" class="dropdown-header"><strong>Deployment</strong></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/local_setup.html" class="active">Local</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/cluster_setup.html">Cluster (Standalone)</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/yarn_setup.html">YARN</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/gce_setup.html">GCloud</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/flink_on_tez.html">Flink on Tez <span class="badge">Beta</span></a></li> |
| |
| <li class="divider"></li> |
| <li><a href="http://flink.apache.org/docs/0.9/setup/config.html">Configuration</a></li> |
| </ul> |
| </li> |
| |
| <!-- Programming Guides --> |
| <li class="dropdown"> |
| <a href="http://flink.apache.org/docs/0.9/apis" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Programming Guides <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu"> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/programming_guide.html"><strong>Batch: DataSet API</strong></a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/streaming_guide.html"><strong>Streaming: DataStream API</strong> <span class="badge">Beta</span></a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/python.html">Python API <span class="badge">Beta</span></a></li> |
| |
| <li class="divider"></li> |
| <li><a href="scala_shell.html">Interactive Scala Shell</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/dataset_transformations.html">Dataset Transformations</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/best_practices.html">Best Practices</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/example_connectors.html">Connectors</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/examples.html">Examples</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/local_execution.html">Local Execution</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/cluster_execution.html">Cluster Execution</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/cli.html">Command Line Interface</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/web_client.html">Web Client</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/iterations.html">Iterations</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/java8.html">Java 8</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/apis/hadoop_compatibility.html">Hadoop Compatability <span class="badge">Beta</span></a></li> |
| </ul> |
| </li> |
| |
| <!-- Libraries --> |
| <li class="dropdown"> |
| <a href="http://flink.apache.org/docs/0.9/libs" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Libraries <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu"> |
| <li><a href="http://flink.apache.org/docs/0.9/libs/spargel_guide.html">Graphs: Spargel</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/libs/gelly_guide.html">Graphs: Gelly <span class="badge">Beta</span></a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/libs/ml/">Machine Learning <span class="badge">Beta</span></a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/libs/table.html">Relational: Table <span class="badge">Beta</span></a></li> |
| </ul> |
| </li> |
| |
| <!-- Internals --> |
| <li class="dropdown"> |
| <a href="http://flink.apache.org/docs/0.9/internals" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Internals <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu"> |
| <li role="presentation" class="dropdown-header"><strong>Contribute</strong></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/how_to_contribute.html">How to Contribute</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/coding_guidelines.html">Coding Guidelines</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/ide_setup.html">IDE Setup</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/logging.html">Logging</a></li> |
| <li class="divider"></li> |
| <li role="presentation" class="dropdown-header"><strong>Internals</strong></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/general_arch.html">Architecture & Process Model</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/types_serialization.html">Type Extraction & Serialization</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/job_scheduling.html">Jobs & Scheduling</a></li> |
| <li><a href="http://flink.apache.org/docs/0.9/internals/add_operator.html">How-To: Add an Operator</a></li> |
| </ul> |
| </li> |
| </ul> |
| <form class="navbar-form navbar-right hidden-sm hidden-md" role="search" action="http://flink.apache.org/docs/0.9/search-results.html"> |
| <div class="form-group"> |
| <input type="text" class="form-control" name="q" placeholder="Search all pages"> |
| </div> |
| <button type="submit" class="btn btn-default">Search</button> |
| </form> |
| </div><!-- /.navbar-collapse --> |
| </div><!-- /.container --> |
| </nav> |
| |
| |
| |
| |
| <!-- Main content. --> |
| <div class="container"> |
| |
| |
| <div class="row"> |
| <div class="col-sm-10 col-sm-offset-1"> |
| <h1>Cluster Setup</h1> |
| |
| |
| |
| <p>This documentation is intended to provide instructions on how to run |
| Flink in a fully distributed fashion on a static (but possibly |
| heterogeneous) cluster.</p> |
| |
| <p>This involves two steps. First, installing and configuring Flink and |
| second installing and configuring the <a href="http://hadoop.apache.org/">Hadoop Distributed |
| Filesystem</a> (HDFS).</p> |
| |
| <ul id="markdown-toc"> |
| <li><a href="#preparing-the-cluster" id="markdown-toc-preparing-the-cluster">Preparing the Cluster</a> <ul> |
| <li><a href="#software-requirements" id="markdown-toc-software-requirements">Software Requirements</a></li> |
| <li><a href="#configuring-remote-access-with-ssh" id="markdown-toc-configuring-remote-access-with-ssh">Configuring Remote Access with ssh</a></li> |
| <li><a href="#setting-javahome-on-each-node" id="markdown-toc-setting-javahome-on-each-node">Setting JAVA_HOME on each Node</a></li> |
| </ul> |
| </li> |
| <li><a href="#hadoop-distributed-filesystem-hdfs-setup" id="markdown-toc-hadoop-distributed-filesystem-hdfs-setup">Hadoop Distributed Filesystem (HDFS) Setup</a> <ul> |
| <li><a href="#downloading-installing-and-configuring-hdfs" id="markdown-toc-downloading-installing-and-configuring-hdfs">Downloading, Installing, and Configuring HDFS</a></li> |
| <li><a href="#starting-hdfs" id="markdown-toc-starting-hdfs">Starting HDFS</a></li> |
| </ul> |
| </li> |
| <li><a href="#flink-setup" id="markdown-toc-flink-setup">Flink Setup</a> <ul> |
| <li><a href="#configuring-the-cluster" id="markdown-toc-configuring-the-cluster">Configuring the Cluster</a></li> |
| <li><a href="#starting-flink" id="markdown-toc-starting-flink">Starting Flink</a></li> |
| <li><a href="#starting-flink-in-the-streaming-mode" id="markdown-toc-starting-flink-in-the-streaming-mode">Starting Flink in the streaming mode</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h2 id="preparing-the-cluster">Preparing the Cluster</h2> |
| |
| <h3 id="software-requirements">Software Requirements</h3> |
| |
| <p>Flink runs on all <em>UNIX-like environments</em>, e.g. <strong>Linux</strong>, <strong>Mac OS X</strong>, |
| and <strong>Cygwin</strong> (for Windows) and expects the cluster to consist of <strong>one master |
| node</strong> and <strong>one or more worker nodes</strong>. Before you start to setup the system, |
| make sure you have the following software installed <strong>on each node</strong>:</p> |
| |
| <ul> |
| <li><strong>Java 1.6.x</strong> or higher,</li> |
| <li><strong>ssh</strong> (sshd must be running to use the Flink scripts that manage |
| remote components)</li> |
| </ul> |
| |
| <p>If your cluster does not fulfill these software requirements you will need to |
| install/upgrade it.</p> |
| |
| <p>For example, on Ubuntu Linux, type in the following commands to install Java and |
| ssh:</p> |
| |
| <div class="highlight"><pre><code class="language-bash">sudo apt-get install ssh |
| sudo apt-get install openjdk-7-jre</code></pre></div> |
| |
| <p>You can check the correct installation of Java by issuing the following command:</p> |
| |
| <div class="highlight"><pre><code class="language-bash">java -version</code></pre></div> |
| |
| <p>The command should output something comparable to the following on every node of |
| your cluster (depending on your Java version, there may be small differences):</p> |
| |
| <div class="highlight"><pre><code class="language-bash">java version <span class="s2">"1.6.0_22"</span> |
| Java<span class="o">(</span>TM<span class="o">)</span> SE Runtime Environment <span class="o">(</span>build 1.6.0_22-b04<span class="o">)</span> |
| Java HotSpot<span class="o">(</span>TM<span class="o">)</span> 64-Bit Server VM <span class="o">(</span>build 17.1-b03, mixed mode<span class="o">)</span></code></pre></div> |
| |
| <p>To make sure the ssh daemon is running properly, you can use the command</p> |
| |
| <div class="highlight"><pre><code class="language-bash">ps aux <span class="p">|</span> grep sshd</code></pre></div> |
| |
| <p>Something comparable to the following line should appear in the output |
| of the command on every host of your cluster:</p> |
| |
| <div class="highlight"><pre><code class="language-bash">root <span class="m">894</span> 0.0 0.0 <span class="m">49260</span> <span class="m">320</span> ? Ss Jan09 0:13 /usr/sbin/sshd</code></pre></div> |
| |
| <h3 id="configuring-remote-access-with-ssh">Configuring Remote Access with ssh</h3> |
| |
| <p>In order to start/stop the remote processes, the master node requires access via |
| ssh to the worker nodes. It is most convenient to use ssh’s public key |
| authentication for this. To setup public key authentication, log on to the |
| master as the user who will later execute all the Flink components. <strong>The |
| same user (i.e. a user with the same user name) must also exist on all worker |
| nodes</strong>. For the remainder of this instruction we will refer to this user as |
| <em>flink</em>. Using the super user <em>root</em> is highly discouraged for security |
| reasons.</p> |
| |
| <p>Once you logged in to the master node as the desired user, you must generate a |
| new public/private key pair. The following command will create a new |
| public/private key pair into the <em>.ssh</em> directory inside the home directory of |
| the user <em>flink</em>. See the ssh-keygen man page for more details. Note that |
| the private key is not protected by a passphrase.</p> |
| |
| <div class="highlight"><pre><code class="language-bash">ssh-keygen -b <span class="m">2048</span> -P <span class="s1">''</span> -f ~/.ssh/id_rsa</code></pre></div> |
| |
| <p>Next, copy/append the content of the file <em>.ssh/id_rsa.pub</em> to your |
| authorized_keys file. The content of the authorized_keys file defines which |
| public keys are considered trustworthy during the public key authentication |
| process. On most systems the appropriate command is</p> |
| |
| <div class="highlight"><pre><code class="language-bash">cat .ssh/id_rsa.pub >> .ssh/authorized_keys</code></pre></div> |
| |
| <p>On some Linux systems, the authorized keys file may also be expected by the ssh |
| daemon under <em>.ssh/authorized_keys2</em>. In either case, you should make sure the |
| file only contains those public keys which you consider trustworthy for each |
| node of cluster.</p> |
| |
| <p>Finally, the authorized keys file must be copied to every worker node of your |
| cluster. You can do this by repeatedly typing in</p> |
| |
| <div class="highlight"><pre><code class="language-bash">scp .ssh/authorized_keys <worker>:~/.ssh/</code></pre></div> |
| |
| <p>and replacing <em><worker></em> with the host name of the respective worker node. |
| After having finished the copy process, you should be able to log on to each |
| worker node from your master node via ssh without a password.</p> |
| |
| <h3 id="setting-javahome-on-each-node">Setting JAVA_HOME on each Node</h3> |
| |
| <p>Flink requires the <code>JAVA_HOME</code> environment variable to be set on the |
| master and all worker nodes and point to the directory of your Java |
| installation.</p> |
| |
| <p>You can set this variable in <code>conf/flink-conf.yaml</code> via the |
| <code>env.java.home</code> key.</p> |
| |
| <p>Alternatively, add the following line to your shell profile. If you use the |
| <em>bash</em> shell (probably the most common shell), the shell profile is located in |
| <em>\~/.bashrc</em>:</p> |
| |
| <div class="highlight"><pre><code class="language-bash"><span class="nb">export </span><span class="nv">JAVA_HOME</span><span class="o">=</span>/path/to/java_home/</code></pre></div> |
| |
| <p>If your ssh daemon supports user environments, you can also add <code>JAVA_HOME</code> to |
| <em>.\~/.ssh/environment</em>. As super user <em>root</em> you can enable ssh user |
| environments with the following commands:</p> |
| |
| <div class="highlight"><pre><code class="language-bash"><span class="nb">echo</span> <span class="s2">"PermitUserEnvironment yes"</span> >> /etc/ssh/sshd_config |
| /etc/init.d/ssh restart</code></pre></div> |
| |
| <h2 id="hadoop-distributed-filesystem-hdfs-setup">Hadoop Distributed Filesystem (HDFS) Setup</h2> |
| |
| <p>The Flink system currently uses the Hadoop Distributed Filesystem (HDFS) |
| to read and write data in a distributed fashion. It is possible to use |
| Flink without HDFS or other distributed file systems.</p> |
| |
| <p>Make sure to have a running HDFS installation. The following instructions are |
| just a general overview of some required settings. Please consult one of the |
| many installation guides available online for more detailed instructions.</p> |
| |
| <p><strong>Note that the following instructions are based on Hadoop 1.2 and might differ |
| for Hadoop 2.</strong></p> |
| |
| <h3 id="downloading-installing-and-configuring-hdfs">Downloading, Installing, and Configuring HDFS</h3> |
| |
| <p>Similar to the Flink system HDFS runs in a distributed fashion. HDFS |
| consists of a <strong>NameNode</strong> which manages the distributed file system’s meta |
| data. The actual data is stored by one or more <strong>DataNodes</strong>. For the remainder |
| of this instruction we assume the HDFS’s NameNode component runs on the master |
| node while all the worker nodes run an HDFS DataNode.</p> |
| |
| <p>To start, log on to your master node and download Hadoop (which includes HDFS) |
| from the Apache <a href="http://hadoop.apache.org/releases.html">Hadoop Releases</a> page.</p> |
| |
| <p>Next, extract the Hadoop archive.</p> |
| |
| <p>After having extracted the Hadoop archive, change into the Hadoop directory and |
| edit the Hadoop environment configuration file:</p> |
| |
| <div class="highlight"><pre><code class="language-bash"><span class="nb">cd </span>hadoop-* |
| vi conf/hadoop-env.sh</code></pre></div> |
| |
| <p>Uncomment and modify the following line in the file according to the path of |
| your Java installation.</p> |
| |
| <div class="highlight"><pre><code>export JAVA_HOME=/path/to/java_home/ |
| </code></pre></div> |
| |
| <p>Save the changes and open the HDFS configuration file <em>conf/hdfs-site.xml</em>. HDFS |
| offers multiple configuration parameters which affect the behavior of the |
| distributed file system in various ways. The following excerpt shows a minimal |
| configuration which is required to make HDFS work. More information on how to |
| configure HDFS can be found in the <a href="http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html">HDFS User |
| Guide</a> guide.</p> |
| |
| <div class="highlight"><pre><code class="language-xml"><span class="nt"><configuration></span> |
| <span class="nt"><property></span> |
| <span class="nt"><name></span>fs.default.name<span class="nt"></name></span> |
| <span class="nt"><value></span>hdfs://MASTER:50040/<span class="nt"></value></span> |
| <span class="nt"></property></span> |
| <span class="nt"><property></span> |
| <span class="nt"><name></span>dfs.data.dir<span class="nt"></name></span> |
| <span class="nt"><value></span>DATAPATH<span class="nt"></value></span> |
| <span class="nt"></property></span> |
| <span class="nt"></configuration></span></code></pre></div> |
| |
| <p>Replace <em>MASTER</em> with the IP/host name of your master node which runs the |
| <em>NameNode</em>. <em>DATAPATH</em> must be replaced with path to the directory in which the |
| actual HDFS data shall be stored on each worker node. Make sure that the |
| <em>flink</em> user has sufficient permissions to read and write in that |
| directory.</p> |
| |
| <p>After having saved the HDFS configuration file, open the file <em>conf/slaves</em> and |
| enter the IP/host name of those worker nodes which shall act as <em>DataNode</em>s. |
| Each entry must be separated by a line break.</p> |
| |
| <div class="highlight"><pre><code><worker 1> |
| <worker 2> |
| . |
| . |
| . |
| <worker n> |
| </code></pre></div> |
| |
| <p>Initialize the HDFS by typing in the following command. Note that the |
| command will <strong>delete all data</strong> which has been previously stored in the |
| HDFS. However, since we have just installed a fresh HDFS, it should be |
| safe to answer the confirmation with <em>yes</em>.</p> |
| |
| <div class="highlight"><pre><code class="language-bash">bin/hadoop namenode -format</code></pre></div> |
| |
| <p>Finally, we need to make sure that the Hadoop directory is available to |
| all worker nodes which are intended to act as DataNodes and that all nodes |
| <strong>find the directory under the same path</strong>. We recommend to use a shared network |
| directory (e.g. an NFS share) for that. Alternatively, one can copy the |
| directory to all nodes (with the disadvantage that all configuration and |
| code updates need to be synced to all nodes).</p> |
| |
| <h3 id="starting-hdfs">Starting HDFS</h3> |
| |
| <p>To start the HDFS log on to the master and type in the following |
| commands</p> |
| |
| <div class="highlight"><pre><code class="language-bash"><span class="nb">cd </span>hadoop-* |
| bin/start-dfs.sh</code></pre></div> |
| |
| <p>If your HDFS setup is correct, you should be able to open the HDFS |
| status website at <em>http://MASTER:50070</em>. In a matter of a seconds, |
| all DataNodes should appear as live nodes. For troubleshooting we would |
| like to point you to the <a href="http://wiki.apache.org/hadoop/QuickStart">Hadoop Quick |
| Start</a> |
| guide.</p> |
| |
| <h2 id="flink-setup">Flink Setup</h2> |
| |
| <p>Go to the <a href="http://flink.apache.org/docs/0.9/downloads.html">downloads page</a> and get the ready to run |
| package. Make sure to pick the Flink package <strong>matching your Hadoop |
| version</strong>.</p> |
| |
| <p>After downloading the latest release, copy the archive to your master node and |
| extract it:</p> |
| |
| <div class="highlight"><pre><code class="language-bash">tar xzf flink-*.tgz |
| <span class="nb">cd </span>flink-*</code></pre></div> |
| |
| <h3 id="configuring-the-cluster">Configuring the Cluster</h3> |
| |
| <p>After having extracted the system files, you need to configure Flink for |
| the cluster by editing <em>conf/flink-conf.yaml</em>.</p> |
| |
| <p>Set the <code>jobmanager.rpc.address</code> key to point to your master node. Furthermode |
| define the maximum amount of main memory the JVM is allowed to allocate on each |
| node by setting the <code>jobmanager.heap.mb</code> and <code>taskmanager.heap.mb</code> keys.</p> |
| |
| <p>The value is given in MB. If some worker nodes have more main memory which you |
| want to allocate to the Flink system you can overwrite the default value |
| by setting an environment variable <code>FLINK_TM_HEAP</code> on the respective |
| node.</p> |
| |
| <p>Finally you must provide a list of all nodes in your cluster which shall be used |
| as worker nodes. Therefore, similar to the HDFS configuration, edit the file |
| <em>conf/slaves</em> and enter the IP/host name of each worker node. Each worker node |
| will later run a TaskManager.</p> |
| |
| <p>Each entry must be separated by a new line, as in the following example:</p> |
| |
| <div class="highlight"><pre><code>192.168.0.100 |
| 192.168.0.101 |
| . |
| . |
| . |
| 192.168.0.150 |
| </code></pre></div> |
| |
| <p>The Flink directory must be available on every worker under the same |
| path. Similarly as for HDFS, you can use a shared NSF directory, or copy the |
| entire Flink directory to every worker node.</p> |
| |
| <p>Please see the <a href="config.html">configuration page</a> for details and additional |
| configuration options.</p> |
| |
| <p>In particular,</p> |
| |
| <ul> |
| <li>the amount of available memory per TaskManager (<code>taskmanager.heap.mb</code>),</li> |
| <li>the number of available CPUs per machine (<code>taskmanager.numberOfTaskSlots</code>),</li> |
| <li>the total number of CPUs in the cluster (<code>parallelism.default</code>) and</li> |
| <li>the temporary directories (<code>taskmanager.tmp.dirs</code>)</li> |
| </ul> |
| |
| <p>are very important configuration values.</p> |
| |
| <h3 id="starting-flink">Starting Flink</h3> |
| |
| <p>The following script starts a JobManager on the local node and connects via |
| SSH to all worker nodes listed in the <em>slaves</em> file to start the |
| TaskManager on each node. Now your Flink system is up and |
| running. The JobManager running on the local node will now accept jobs |
| at the configured RPC port.</p> |
| |
| <p>Assuming that you are on the master node and inside the Flink directory:</p> |
| |
| <div class="highlight"><pre><code class="language-bash">bin/start-cluster.sh</code></pre></div> |
| |
| <p>To stop Flink, there is also a <code>stop-cluster.sh</code> script.</p> |
| |
| <h3 id="starting-flink-in-the-streaming-mode">Starting Flink in the streaming mode</h3> |
| |
| <div class="highlight"><pre><code class="language-bash">bin/start-cluster-streaming.sh</code></pre></div> |
| |
| <p>The streaming mode changes the startup behavior of Flink: The system is not |
| bringing up the managed memory services with preallocated memory at the beginning. |
| Flink streaming is not using the managed memory employed by the batch operators. |
| By not starting these services with preallocated memory, streaming jobs can benefit |
| from more heap space being available.</p> |
| |
| <p>Note that you can still start batch jobs in the streaming mode. The memory manager |
| will then allocate memory segments from the Java heap as needed.</p> |
| |
| </div> |
| |
| <div class="col-sm-10 col-sm-offset-1"> |
| <!-- Disqus thread and some vertical offset --> |
| <div style="margin-top: 75px; margin-bottom: 50px" id="disqus_thread"></div> |
| </div> |
| </div> |
| |
| </div><!-- /.container --> |
| |
| <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> |
| <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script> |
| <!-- Include all compiled plugins (below), or include individual files as needed --> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script> |
| <script src="http://flink.apache.org/docs/0.9/page/js/codetabs.js"></script> |
| |
| <!-- Google Analytics --> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-52545728-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| |
| <!-- Disqus --> |
| <script type="text/javascript"> |
| var disqus_shortname = 'stratosphere-eu'; |
| (function() { |
| var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; |
| dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; |
| (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); |
| })(); |
| </script> |
| </body> |
| </html> |