blob: 8ab468b94c4cfc3466482bc45b758eaa3741d2dc [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Setting up a Storm Cluster</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.3.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Setting up a Storm Cluster</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><p>This page outlines the steps for getting a Storm cluster up and running. </p>
<p>If you run into difficulties with your Storm cluster, first check for a solution is in the <a href="Troubleshooting.html">Troubleshooting</a> page. Otherwise, email the mailing list.</p>
<p>Here&#39;s a summary of the steps for setting up a Storm cluster:</p>
<ol>
<li>Set up a Zookeeper cluster</li>
<li>Install dependencies on Nimbus and worker machines</li>
<li>Download and extract a Storm release to Nimbus and worker machines</li>
<li>Fill in mandatory configurations into storm.yaml</li>
<li>Launch daemons under supervision using &quot;storm&quot; script and a supervisor of your choice</li>
<li>Setup DRPC servers (Optional)</li>
</ol>
<h3 id="set-up-a-zookeeper-cluster">Set up a Zookeeper cluster</h3>
<p>Storm uses Zookeeper for coordinating the cluster. Zookeeper <strong>is not</strong> used for message passing, so the load Storm places on Zookeeper is quite low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying large Storm clusters you may want larger Zookeeper clusters. Instructions for deploying Zookeeper are <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html">here</a>.</p>
<p>A few notes about Zookeeper deployment:</p>
<ol>
<li>It&#39;s critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision">here</a> for more details.</li>
<li>It&#39;s critical that you set up a cron to compact Zookeeper&#39;s data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don&#39;t set up a cron, Zookeeper will quickly run out of disk space. See <a href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance">here</a> for more details.</li>
</ol>
<h3 id="install-dependencies-on-nimbus-and-worker-machines">Install dependencies on Nimbus and worker machines</h3>
<p>Next you need to install Storm&#39;s dependencies on Nimbus and the worker machines. These are:</p>
<ol>
<li>Java 8+ (Apache Storm 2.x is tested through travis ci against a java 8 JDK)</li>
<li>Python 2.7.x or Python 3.x</li>
</ol>
<p>These are the versions of the dependencies that have been tested with Storm. Storm may or may not work with different versions of Java and/or Python.</p>
<h3 id="download-and-extract-a-storm-release-to-nimbus-and-worker-machines">Download and extract a Storm release to Nimbus and worker machines</h3>
<p>Next, download a Storm release and extract the zip file somewhere on Nimbus and each of the worker machines. The Storm releases can be downloaded <a href="../../downloads.html">from here</a>.</p>
<h3 id="fill-in-mandatory-configurations-into-storm-yaml">Fill in mandatory configurations into storm.yaml</h3>
<p>The Storm release contains a file at <code>conf/storm.yaml</code> that configures the Storm daemons. You can see the default configuration values <a href="http://github.com/apache/storm/blob/v2.3.0/conf/defaults.yaml">here</a>. storm.yaml overrides anything in defaults.yaml. There&#39;s a few configurations that are mandatory to get a working cluster:</p>
<p>1) <strong>storm.zookeeper.servers</strong>: This is a list of the hosts in the Zookeeper cluster for your Storm cluster. It should look something like:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">storm.zookeeper.servers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">111.222.333.444"</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">555.666.777.888"</span>
</code></pre></div>
<p>If the port that your Zookeeper cluster uses is different than the default, you should set <strong>storm.zookeeper.port</strong> as well.</p>
<p>2) <strong>storm.local.dir</strong>: The Nimbus and Supervisor daemons require a directory on the local disk to store small amounts of state (like jars, confs, and things like that).
You should create that directory on each machine, give it proper permissions, and then fill in the directory location using this config. For example:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">storm.local.dir</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/mnt/storm"</span>
</code></pre></div>
<p>If you run storm on windows, it could be:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">storm.local.dir</span><span class="pi">:</span> <span class="s2">"</span><span class="s">C:</span><span class="se">\\</span><span class="s">storm-local"</span>
</code></pre></div>
<p>If you use a relative path, it will be relative to where you installed storm(STORM_HOME).
You can leave it empty with default value <code>$STORM_HOME/storm-local</code></p>
<p>3) <strong>nimbus.seeds</strong>: The worker nodes need to know which machines are the candidate of master in order to download topology jars and confs. For example:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">nimbus.seeds</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">111.222.333.44"</span><span class="pi">]</span>
</code></pre></div>
<p>You&#39;re encouraged to fill out the value to list of <strong>machine&#39;s FQDN</strong>. If you want to set up Nimbus H/A, you have to address all machines&#39; FQDN which run nimbus. You may want to leave it to default value when you just want to set up &#39;pseudo-distributed&#39; cluster, but you&#39;re still encouraged to fill out FQDN.</p>
<p>4) <strong>supervisor.slots.ports</strong>: For each worker machine, you configure how many workers run on that machine with this config. Each worker uses a single port for receiving messages, and this setting defines which ports are open for use. If you define five ports here, then Storm will allocate up to five workers to run on this machine. If you define three ports, Storm will only run up to three. By default, this setting is configured to run 4 workers on the ports 6700, 6701, 6702, and 6703. For example:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">supervisor.slots.ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">6700</span>
<span class="pi">-</span> <span class="s">6701</span>
<span class="pi">-</span> <span class="s">6702</span>
<span class="pi">-</span> <span class="s">6703</span>
</code></pre></div>
<p>5) <strong>drpc.servers</strong>: If you want to setup DRPC servers they need to specified so that the workers can find them. This should be a list of the DRPC servers. For example:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">drpc.servers</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">111.222.333.44"</span><span class="pi">]</span>
</code></pre></div>
<h3 id="monitoring-health-of-supervisors">Monitoring Health of Supervisors</h3>
<p>Storm provides a mechanism by which administrators can configure the supervisor to run administrator supplied scripts periodically to determine if a node is healthy or not. Administrators can have the supervisor determine if the node is in a healthy state by performing any checks of their choice in scripts located in storm.health.check.dir. If a script detects the node to be in an unhealthy state, it must return a non-zero exit code. In pre-Storm 2.x releases, a bug considered a script exit value of 0 to be a failure. This has now been fixed. The supervisor will periodically run the scripts in the health check dir and check the output. If the script’s output contains the string ERROR, as described above, the supervisor will shut down any workers and exit.</p>
<p>If the supervisor is running with supervision &quot;/bin/storm node-health-check&quot; can be called to determine if the supervisor should be launched or if the node is unhealthy.</p>
<p>The health check directory location can be configured with:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">storm.health.check.dir</span><span class="pi">:</span> <span class="s2">"</span><span class="s">healthchecks"</span>
</code></pre></div>
<p>The scripts must have execute permissions.
The time to allow any given healthcheck script to run before it is marked failed due to timeout can be configured with:</p>
<div class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="s">storm.health.check.timeout.ms</span><span class="pi">:</span> <span class="s">5000</span>
</code></pre></div>
<h3 id="configure-external-libraries-and-environment-variables-optional">Configure external libraries and environment variables (optional)</h3>
<p>If you need support from external libraries or custom plugins, you can place such jars into the extlib/ and extlib-daemon/ directories. Note that the extlib-daemon/ directory stores jars used only by daemons (Nimbus, Supervisor, DRPC, UI, Logviewer), e.g., HDFS and customized scheduling libraries. Accordingly, two environment variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON can be configured by users for including the external classpath and daemon-only external classpath. See <a href="Classpath-handling.html">Classpath handling</a> for more details on using external libraries.</p>
<h3 id="launch-daemons-under-supervision-using-storm-script-and-a-supervisor-of-your-choice">Launch daemons under supervision using &quot;storm&quot; script and a supervisor of your choice</h3>
<p>The last step is to launch all the Storm daemons. It is critical that you run each of these daemons under supervision. Storm is a <strong>fail-fast</strong> system which means the processes will halt whenever an unexpected error is encountered. Storm is designed so that it can safely halt at any point and recover correctly when the process is restarted. This is why Storm keeps no state in-process -- if Nimbus or the Supervisors restart, the running topologies are unaffected. Here&#39;s how to run the Storm daemons:</p>
<ol>
<li><strong>Nimbus</strong>: Run the command <code>bin/storm nimbus</code> under supervision on the master machine.</li>
<li><strong>Supervisor</strong>: Run the command <code>bin/storm supervisor</code> under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.</li>
<li><strong>UI</strong>: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command &quot;bin/storm ui&quot; under supervision. The UI can be accessed by navigating your web browser to http://{ui host}:8080.</li>
</ol>
<p>As you can see, running the daemons is very straightforward. The daemons will log to the logs/ directory in wherever you extracted the Storm release.</p>
<h3 id="setup-drpc-servers-optional">Setup DRPC servers (Optional)</h3>
<p>Just like with nimbus or the supervisors you will need to launch the drpc server. To do this run the command <code>bin/storm drpc</code> on each of the machines that you configured as a part of the <code>drpc.servers</code> config.</p>
<h4 id="drpc-http-setup">DRPC Http Setup</h4>
<p>DRPC optionally offers a REST API as well. To enable this set teh config <code>drpc.http.port</code> to the port you want to run on before launching the DRPC server. See the <a href="STORM-UI-REST-API.html">REST documentation</a> for more information on how to use it.</p>
<p>It also supports SSL by setting <code>drpc.https.port</code> along with the keystore and optional truststore similar to how you would configure the UI.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>