docs/0.9/setup/cluster_setup.html - flink-web - Git at Google

 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE html>

 <html lang="en">
   <head>
     <meta charset="utf-8">
     <meta http-equiv="X-UA-Compatible" content="IE=edge">
     <meta name="viewport" content="width=device-width, initial-scale=1">
     <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->

     <title>Apache Flink 0.9.0 Documentation: Cluster Setup</title>

     <link rel="shortcut icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon">
     <link rel="icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon">

     <!-- Bootstrap -->
     <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">
     <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/flink.css">
     <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/syntax.css">
     <link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/codetabs.css">

     <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
     <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
     <!--[if lt IE 9]>
       <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
       <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
     <![endif]-->
   </head>
   <body>


     <!-- Top navbar. -->
     <nav class="navbar navbar-default navbar-fixed-top">
       <div class="container">
         <!-- The logo. -->
         <div class="navbar-header">
           <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
             <span class="icon-bar"></span>
             <span class="icon-bar"></span>
             <span class="icon-bar"></span>
           </button>
           <div class="navbar-logo">
             <a href="http://flink.apache.org"><img alt="Apache Flink" src="http://flink.apache.org/docs/0.9/page/img/navbar-brand-logo.jpg"></a>
           </div>
         </div><!-- /.navbar-header -->

         <!-- The navigation links. -->
         <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
           <ul class="nav navbar-nav">
             <li><a href="http://flink.apache.org/docs/0.9/index.html">Overview<span class="hidden-sm hidden-xs"> 0.9.0</span></a></li>

             <!-- Setup -->
             <li class="dropdown">
               <a href="http://flink.apache.org/docs/0.9/setup" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Setup <span class="caret"></span></a>
               <ul class="dropdown-menu" role="menu">
                 <li><a href="http://flink.apache.org/docs/0.9/setup/building.html">Get Flink 0.9-SNAPSHOT</a></li>

                 <li class="divider"></li>
                 <li role="presentation" class="dropdown-header"><strong>Deployment</strong></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/local_setup.html" class="active">Local</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/cluster_setup.html">Cluster (Standalone)</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/yarn_setup.html">YARN</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/gce_setup.html">GCloud</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/flink_on_tez.html">Flink on Tez <span class="badge">Beta</span></a></li>

                 <li class="divider"></li>
                 <li><a href="http://flink.apache.org/docs/0.9/setup/config.html">Configuration</a></li>
               </ul>
             </li>

             <!-- Programming Guides -->
             <li class="dropdown">
               <a href="http://flink.apache.org/docs/0.9/apis" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Programming Guides <span class="caret"></span></a>
               <ul class="dropdown-menu" role="menu">
                 <li><a href="http://flink.apache.org/docs/0.9/apis/programming_guide.html"><strong>Batch: DataSet API</strong></a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/streaming_guide.html"><strong>Streaming: DataStream API</strong> <span class="badge">Beta</span></a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/python.html">Python API <span class="badge">Beta</span></a></li>

                 <li class="divider"></li>
                 <li><a href="scala_shell.html">Interactive Scala Shell</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/dataset_transformations.html">Dataset Transformations</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/best_practices.html">Best Practices</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/example_connectors.html">Connectors</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/examples.html">Examples</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/local_execution.html">Local Execution</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/cluster_execution.html">Cluster Execution</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/cli.html">Command Line Interface</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/web_client.html">Web Client</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/iterations.html">Iterations</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/java8.html">Java 8</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/apis/hadoop_compatibility.html">Hadoop Compatability <span class="badge">Beta</span></a></li>
               </ul>
             </li>

             <!-- Libraries -->
             <li class="dropdown">
               <a href="http://flink.apache.org/docs/0.9/libs" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Libraries <span class="caret"></span></a>
                 <ul class="dropdown-menu" role="menu">
                   <li><a href="http://flink.apache.org/docs/0.9/libs/spargel_guide.html">Graphs: Spargel</a></li>
                   <li><a href="http://flink.apache.org/docs/0.9/libs/gelly_guide.html">Graphs: Gelly <span class="badge">Beta</span></a></li>
                   <li><a href="http://flink.apache.org/docs/0.9/libs/ml/">Machine Learning <span class="badge">Beta</span></a></li>
                   <li><a href="http://flink.apache.org/docs/0.9/libs/table.html">Relational: Table <span class="badge">Beta</span></a></li>
               </ul>
             </li>

             <!-- Internals -->
             <li class="dropdown">
               <a href="http://flink.apache.org/docs/0.9/internals" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Internals <span class="caret"></span></a>
               <ul class="dropdown-menu" role="menu">
                 <li role="presentation" class="dropdown-header"><strong>Contribute</strong></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/how_to_contribute.html">How to Contribute</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/coding_guidelines.html">Coding Guidelines</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/ide_setup.html">IDE Setup</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/logging.html">Logging</a></li>
                 <li class="divider"></li>
                 <li role="presentation" class="dropdown-header"><strong>Internals</strong></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/general_arch.html">Architecture &amp; Process Model</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/types_serialization.html">Type Extraction &amp; Serialization</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/job_scheduling.html">Jobs &amp; Scheduling</a></li>
                 <li><a href="http://flink.apache.org/docs/0.9/internals/add_operator.html">How-To: Add an Operator</a></li>
               </ul>
             </li>
           </ul>
           <form class="navbar-form navbar-right hidden-sm hidden-md" role="search" action="http://flink.apache.org/docs/0.9/search-results.html">
             <div class="form-group">
               <input type="text" class="form-control" name="q" placeholder="Search all pages">
             </div>
             <button type="submit" class="btn btn-default">Search</button>
           </form>
         </div><!-- /.navbar-collapse -->
       </div><!-- /.container -->
     </nav>


     <!-- Main content. -->
     <div class="container">


 <div class="row">
   <div class="col-sm-10 col-sm-offset-1">
     <h1>Cluster Setup</h1>


 <p>This documentation is intended to provide instructions on how to run
 Flink in a fully distributed fashion on a static (but possibly
 heterogeneous) cluster.</p>

 <p>This involves two steps. First, installing and configuring Flink and
 second installing and configuring the <a href="http://hadoop.apache.org/">Hadoop Distributed
 Filesystem</a> (HDFS).</p>

 <ul id="markdown-toc">
   <li><a href="#preparing-the-cluster" id="markdown-toc-preparing-the-cluster">Preparing the Cluster</a>    <ul>
       <li><a href="#software-requirements" id="markdown-toc-software-requirements">Software Requirements</a></li>
       <li><a href="#configuring-remote-access-with-ssh" id="markdown-toc-configuring-remote-access-with-ssh">Configuring Remote Access with ssh</a></li>
       <li><a href="#setting-javahome-on-each-node" id="markdown-toc-setting-javahome-on-each-node">Setting JAVA_HOME on each Node</a></li>
     </ul>
   </li>
   <li><a href="#hadoop-distributed-filesystem-hdfs-setup" id="markdown-toc-hadoop-distributed-filesystem-hdfs-setup">Hadoop Distributed Filesystem (HDFS) Setup</a>    <ul>
       <li><a href="#downloading-installing-and-configuring-hdfs" id="markdown-toc-downloading-installing-and-configuring-hdfs">Downloading, Installing, and Configuring HDFS</a></li>
       <li><a href="#starting-hdfs" id="markdown-toc-starting-hdfs">Starting HDFS</a></li>
     </ul>
   </li>
   <li><a href="#flink-setup" id="markdown-toc-flink-setup">Flink Setup</a>    <ul>
       <li><a href="#configuring-the-cluster" id="markdown-toc-configuring-the-cluster">Configuring the Cluster</a></li>
       <li><a href="#starting-flink" id="markdown-toc-starting-flink">Starting Flink</a></li>
       <li><a href="#starting-flink-in-the-streaming-mode" id="markdown-toc-starting-flink-in-the-streaming-mode">Starting Flink in the streaming mode</a></li>
     </ul>
   </li>
 </ul>

 <h2 id="preparing-the-cluster">Preparing the Cluster</h2>

 <h3 id="software-requirements">Software Requirements</h3>

 <p>Flink runs on all <em>UNIX-like environments</em>, e.g. <strong>Linux</strong>, <strong>Mac OS X</strong>,
 and <strong>Cygwin</strong> (for Windows) and expects the cluster to consist of <strong>one master
 node</strong> and <strong>one or more worker nodes</strong>. Before you start to setup the system,
 make sure you have the following software installed <strong>on each node</strong>:</p>

 <ul>
   <li><strong>Java 1.6.x</strong> or higher,</li>
   <li><strong>ssh</strong> (sshd must be running to use the Flink scripts that manage
 remote components)</li>
 </ul>

 <p>If your cluster does not fulfill these software requirements you will need to
 install/upgrade it.</p>

 <p>For example, on Ubuntu Linux, type in the following commands to install Java and
 ssh:</p>

 <div class="highlight"><pre><code class="language-bash">sudo apt-get install ssh
 sudo apt-get install openjdk-7-jre</code></pre></div>

 <p>You can check the correct installation of Java by issuing the following command:</p>

 <div class="highlight"><pre><code class="language-bash">java -version</code></pre></div>

 <p>The command should output something comparable to the following on every node of
 your cluster (depending on your Java version, there may be small differences):</p>

 <div class="highlight"><pre><code class="language-bash">java version <span class="s2">&quot;1.6.0_22&quot;</span>
 Java<span class="o">(</span>TM<span class="o">)</span> SE Runtime Environment <span class="o">(</span>build 1.6.0_22-b04<span class="o">)</span>
 Java HotSpot<span class="o">(</span>TM<span class="o">)</span> 64-Bit Server VM <span class="o">(</span>build 17.1-b03, mixed mode<span class="o">)</span></code></pre></div>

 <p>To make sure the ssh daemon is running properly, you can use the command</p>

 <div class="highlight"><pre><code class="language-bash">ps aux <span class="p">|</span> grep sshd</code></pre></div>

 <p>Something comparable to the following line should appear in the output
 of the command on every host of your cluster:</p>

 <div class="highlight"><pre><code class="language-bash">root       <span class="m">894</span>  0.0  0.0  <span class="m">49260</span>   <span class="m">320</span> ?        Ss   Jan09   0:13 /usr/sbin/sshd</code></pre></div>

 <h3 id="configuring-remote-access-with-ssh">Configuring Remote Access with ssh</h3>

 <p>In order to start/stop the remote processes, the master node requires access via
 ssh to the worker nodes. It is most convenient to use ssh’s public key
 authentication for this. To setup public key authentication, log on to the
 master as the user who will later execute all the Flink components. <strong>The
 same user (i.e. a user with the same user name) must also exist on all worker
 nodes</strong>. For the remainder of this instruction we will refer to this user as
 <em>flink</em>. Using the super user <em>root</em> is highly discouraged for security
 reasons.</p>

 <p>Once you logged in to the master node as the desired user, you must generate a
 new public/private key pair. The following command will create a new
 public/private key pair into the <em>.ssh</em> directory inside the home directory of
 the user <em>flink</em>. See the ssh-keygen man page for more details. Note that
 the private key is not protected by a passphrase.</p>

 <div class="highlight"><pre><code class="language-bash">ssh-keygen -b <span class="m">2048</span> -P <span class="s1">&#39;&#39;</span> -f ~/.ssh/id_rsa</code></pre></div>

 <p>Next, copy/append the content of the file <em>.ssh/id_rsa.pub</em> to your
 authorized_keys file. The content of the authorized_keys file defines which
 public keys are considered trustworthy during the public key authentication
 process. On most systems the appropriate command is</p>

 <div class="highlight"><pre><code class="language-bash">cat .ssh/id_rsa.pub &gt;&gt; .ssh/authorized_keys</code></pre></div>

 <p>On some Linux systems, the authorized keys file may also be expected by the ssh
 daemon under <em>.ssh/authorized_keys2</em>. In either case, you should make sure the
 file only contains those public keys which you consider trustworthy for each
 node of cluster.</p>

 <p>Finally, the authorized keys file must be copied to every worker node of your
 cluster. You can do this by repeatedly typing in</p>

 <div class="highlight"><pre><code class="language-bash">scp .ssh/authorized_keys &lt;worker&gt;:~/.ssh/</code></pre></div>

 <p>and replacing <em>&lt;worker&gt;</em> with the host name of the respective worker node.
 After having finished the copy process, you should be able to log on to each
 worker node from your master node via ssh without a password.</p>

 <h3 id="setting-javahome-on-each-node">Setting JAVA_HOME on each Node</h3>

 <p>Flink requires the <code>JAVA_HOME</code> environment variable to be set on the
 master and all worker nodes and point to the directory of your Java
 installation.</p>

 <p>You can set this variable in <code>conf/flink-conf.yaml</code> via the
 <code>env.java.home</code> key.</p>

 <p>Alternatively, add the following line to your shell profile. If you use the
 <em>bash</em> shell (probably the most common shell), the shell profile is located in
 <em>\~/.bashrc</em>:</p>

 <div class="highlight"><pre><code class="language-bash"><span class="nb">export </span><span class="nv">JAVA_HOME</span><span class="o">=</span>/path/to/java_home/</code></pre></div>

 <p>If your ssh daemon supports user environments, you can also add <code>JAVA_HOME</code> to
 <em>.\~/.ssh/environment</em>. As super user <em>root</em> you can enable ssh user
 environments with the following commands:</p>

 <div class="highlight"><pre><code class="language-bash"><span class="nb">echo</span> <span class="s2">&quot;PermitUserEnvironment yes&quot;</span> &gt;&gt; /etc/ssh/sshd_config
 /etc/init.d/ssh restart</code></pre></div>

 <h2 id="hadoop-distributed-filesystem-hdfs-setup">Hadoop Distributed Filesystem (HDFS) Setup</h2>

 <p>The Flink system currently uses the Hadoop Distributed Filesystem (HDFS)
 to read and write data in a distributed fashion. It is possible to use
 Flink without HDFS or other distributed file systems.</p>

 <p>Make sure to have a running HDFS installation. The following instructions are
 just a general overview of some required settings. Please consult one of the
 many installation guides available online for more detailed instructions.</p>

 <p><strong>Note that the following instructions are based on Hadoop 1.2 and might differ
 for Hadoop 2.</strong></p>

 <h3 id="downloading-installing-and-configuring-hdfs">Downloading, Installing, and Configuring HDFS</h3>

 <p>Similar to the Flink system HDFS runs in a distributed fashion. HDFS
 consists of a <strong>NameNode</strong> which manages the distributed file system’s meta
 data. The actual data is stored by one or more <strong>DataNodes</strong>. For the remainder
 of this instruction we assume the HDFS’s NameNode component runs on the master
 node while all the worker nodes run an HDFS DataNode.</p>

 <p>To start, log on to your master node and download Hadoop (which includes  HDFS)
 from the Apache <a href="http://hadoop.apache.org/releases.html">Hadoop Releases</a> page.</p>

 <p>Next, extract the Hadoop archive.</p>

 <p>After having extracted the Hadoop archive, change into the Hadoop directory and
 edit the Hadoop environment configuration file:</p>

 <div class="highlight"><pre><code class="language-bash"><span class="nb">cd </span>hadoop-*
 vi conf/hadoop-env.sh</code></pre></div>

 <p>Uncomment and modify the following line in the file according to the path of
 your Java installation.</p>

 <div class="highlight"><pre><code>export JAVA_HOME=/path/to/java_home/
 </code></pre></div>

 <p>Save the changes and open the HDFS configuration file <em>conf/hdfs-site.xml</em>. HDFS
 offers multiple configuration parameters which affect the behavior of the
 distributed file system in various ways. The following excerpt shows a minimal
 configuration which is required to make HDFS work. More information on how to
 configure HDFS can be found in the <a href="http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html">HDFS User
 Guide</a> guide.</p>

 <div class="highlight"><pre><code class="language-xml"><span class="nt">&lt;configuration&gt;</span>
   <span class="nt">&lt;property&gt;</span>
     <span class="nt">&lt;name&gt;</span>fs.default.name<span class="nt">&lt;/name&gt;</span>
     <span class="nt">&lt;value&gt;</span>hdfs://MASTER:50040/<span class="nt">&lt;/value&gt;</span>
   <span class="nt">&lt;/property&gt;</span>
   <span class="nt">&lt;property&gt;</span>
     <span class="nt">&lt;name&gt;</span>dfs.data.dir<span class="nt">&lt;/name&gt;</span>
     <span class="nt">&lt;value&gt;</span>DATAPATH<span class="nt">&lt;/value&gt;</span>
   <span class="nt">&lt;/property&gt;</span>
 <span class="nt">&lt;/configuration&gt;</span></code></pre></div>

 <p>Replace <em>MASTER</em> with the IP/host name of your master node which runs the
 <em>NameNode</em>. <em>DATAPATH</em> must be replaced with path to the directory in which the
 actual HDFS data shall be stored on each worker node. Make sure that the
 <em>flink</em> user has sufficient permissions to read and write in that
 directory.</p>

 <p>After having saved the HDFS configuration file, open the file <em>conf/slaves</em> and
 enter the IP/host name of those worker nodes which shall act as <em>DataNode</em>s.
 Each entry must be separated by a line break.</p>

 <div class="highlight"><pre><code>&lt;worker 1&gt;
 &lt;worker 2&gt;
 .
 .
 .
 &lt;worker n&gt;
 </code></pre></div>

 <p>Initialize the HDFS by typing in the following command. Note that the
 command will <strong>delete all data</strong> which has been previously stored in the
 HDFS. However, since we have just installed a fresh HDFS, it should be
 safe to answer the confirmation with <em>yes</em>.</p>

 <div class="highlight"><pre><code class="language-bash">bin/hadoop namenode -format</code></pre></div>

 <p>Finally, we need to make sure that the Hadoop directory is available to
 all worker nodes which are intended to act as DataNodes and that all nodes
 <strong>find the directory under the same path</strong>. We recommend to use a shared network
 directory (e.g. an NFS share) for that. Alternatively, one can copy the
 directory to all nodes (with the disadvantage that all configuration and
 code updates need to be synced to all nodes).</p>

 <h3 id="starting-hdfs">Starting HDFS</h3>

 <p>To start the HDFS log on to the master and type in the following
 commands</p>

 <div class="highlight"><pre><code class="language-bash"><span class="nb">cd </span>hadoop-*
 bin/start-dfs.sh</code></pre></div>

 <p>If your HDFS setup is correct, you should be able to open the HDFS
 status website at <em>http://MASTER:50070</em>. In a matter of a seconds,
 all DataNodes should appear as live nodes. For troubleshooting we would
 like to point you to the <a href="http://wiki.apache.org/hadoop/QuickStart">Hadoop Quick
 Start</a>
 guide.</p>

 <h2 id="flink-setup">Flink Setup</h2>

 <p>Go to the <a href="http://flink.apache.org/docs/0.9/downloads.html">downloads page</a> and get the ready to run
 package. Make sure to pick the Flink package <strong>matching your Hadoop
 version</strong>.</p>

 <p>After downloading the latest release, copy the archive to your master node and
 extract it:</p>

 <div class="highlight"><pre><code class="language-bash">tar xzf flink-*.tgz
 <span class="nb">cd </span>flink-*</code></pre></div>

 <h3 id="configuring-the-cluster">Configuring the Cluster</h3>

 <p>After having extracted the system files, you need to configure Flink for
 the cluster by editing <em>conf/flink-conf.yaml</em>.</p>

 <p>Set the <code>jobmanager.rpc.address</code> key to point to your master node. Furthermode
 define the maximum amount of main memory the JVM is allowed to allocate on each
 node by setting the <code>jobmanager.heap.mb</code> and <code>taskmanager.heap.mb</code> keys.</p>

 <p>The value is given in MB. If some worker nodes have more main memory which you
 want to allocate to the Flink system you can overwrite the default value
 by setting an environment variable <code>FLINK_TM_HEAP</code> on the respective
 node.</p>

 <p>Finally you must provide a list of all nodes in your cluster which shall be used
 as worker nodes. Therefore, similar to the HDFS configuration, edit the file
 <em>conf/slaves</em> and enter the IP/host name of each worker node. Each worker node
 will later run a TaskManager.</p>

 <p>Each entry must be separated by a new line, as in the following example:</p>

 <div class="highlight"><pre><code>192.168.0.100
 192.168.0.101
 .
 .
 .
 192.168.0.150
 </code></pre></div>

 <p>The Flink directory must be available on every worker under the same
 path. Similarly as for HDFS, you can use a shared NSF directory, or copy the
 entire Flink directory to every worker node.</p>

 <p>Please see the <a href="config.html">configuration page</a> for details and additional
 configuration options.</p>

 <p>In particular,</p>

 <ul>
   <li>the amount of available memory per TaskManager (<code>taskmanager.heap.mb</code>),</li>
   <li>the number of available CPUs per machine (<code>taskmanager.numberOfTaskSlots</code>),</li>
   <li>the total number of CPUs in the cluster (<code>parallelism.default</code>) and</li>
   <li>the temporary directories (<code>taskmanager.tmp.dirs</code>)</li>
 </ul>

 <p>are very important configuration values.</p>

 <h3 id="starting-flink">Starting Flink</h3>

 <p>The following script starts a JobManager on the local node and connects via
 SSH to all worker nodes listed in the <em>slaves</em> file to start the
 TaskManager on each node. Now your Flink system is up and
 running. The JobManager running on the local node will now accept jobs
 at the configured RPC port.</p>

 <p>Assuming that you are on the master node and inside the Flink directory:</p>

 <div class="highlight"><pre><code class="language-bash">bin/start-cluster.sh</code></pre></div>

 <p>To stop Flink, there is also a <code>stop-cluster.sh</code> script.</p>

 <h3 id="starting-flink-in-the-streaming-mode">Starting Flink in the streaming mode</h3>

 <div class="highlight"><pre><code class="language-bash">bin/start-cluster-streaming.sh</code></pre></div>

 <p>The streaming mode changes the startup behavior of Flink: The system is not
 bringing up the managed memory services with preallocated memory at the beginning.
 Flink streaming is not using the managed memory employed by the batch operators.
 By not starting these services with preallocated memory, streaming jobs can benefit
 from more heap space being available.</p>

 <p>Note that you can still start batch jobs in the streaming mode. The memory manager
 will then allocate memory segments from the Java heap as needed.</p>

   </div>

   <div class="col-sm-10 col-sm-offset-1">
     <!-- Disqus thread and some vertical offset -->
     <div style="margin-top: 75px; margin-bottom: 50px" id="disqus_thread"></div>
   </div>
 </div>

     </div><!-- /.container -->

     <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
     <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
     <!-- Include all compiled plugins (below), or include individual files as needed -->
     <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
     <script src="http://flink.apache.org/docs/0.9/page/js/codetabs.js"></script>

     <!-- Google Analytics -->
     <script>
       (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
       (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
       m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
       })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

       ga('create', 'UA-52545728-1', 'auto');
       ga('send', 'pageview');
     </script>

     <!-- Disqus -->
     <script type="text/javascript">
     var disqus_shortname = 'stratosphere-eu';
     (function() {
         var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
         dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
         (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
     })();
 </script>
   </body>
 </html>