blob: f2d2b16328d6a12a0b124cf6f912081d78718632 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Understanding the Parallelism of a Storm Topology</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-10">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li><a href="/documentation.html" id="documentation">Documentation</a></li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2015/11/05/storm096-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Understanding the Parallelism of a Storm Topology</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<h2 id="what-makes-a-running-topology-worker-processes-executors-and-tasks">What makes a running topology: worker processes, executors and tasks</h2>
<p>Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:</p>
<ol>
<li>Worker processes</li>
<li>Executors (threads)</li>
<li>Tasks</li>
</ol>
<p>Here is a simple illustration of their relationships:</p>
<p><img src="images/relationships-worker-processes-executors-tasks.png" alt="The relationships of worker processes, executors (threads) and tasks in Storm"></p>
<p>A <em>worker process</em> executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.</p>
<p>An <em>executor</em> is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).</p>
<p>A <em>task</em> performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: <code>#threads ≤ #tasks</code>. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.</p>
<h2 id="configuring-the-parallelism-of-a-topology">Configuring the parallelism of a topology</h2>
<p>Note that in Storm’s terminology &quot;parallelism&quot; is specifically used to describe the so-called <em>parallelism hint</em>, which means the initial number of executor (threads) of a component. In this document though we use the term &quot;parallelism&quot; in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when &quot;parallelism&quot; is used in the normal, narrow definition of Storm.</p>
<p>The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them. Storm currently has the following <a href="Configuration.html">order of precedence for configuration settings</a>: <code>defaults.yaml</code> &lt; <code>storm.yaml</code> &lt; topology-specific configuration &lt; internal component-specific configuration &lt; external component-specific configuration.</p>
<h3 id="number-of-worker-processes">Number of worker processes</h3>
<ul>
<li>Description: How many worker processes to create <em>for the topology</em> across machines in the cluster.</li>
<li>Configuration option: <a href="/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_WORKERS">TOPOLOGY_WORKERS</a></li>
<li>How to set in your code (examples):
<ul>
<li><a href="/javadoc/apidocs/backtype/storm/Config.html">Config#setNumWorkers</a></li>
</ul></li>
</ul>
<h3 id="number-of-executors-threads">Number of executors (threads)</h3>
<ul>
<li>Description: How many executors to spawn <em>per component</em>.</li>
<li>Configuration option: None (pass <code>parallelism_hint</code> parameter to <code>setSpout</code> or <code>setBolt</code>)</li>
<li>How to set in your code (examples):
<ul>
<li><a href="/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setSpout()</a></li>
<li><a href="/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html">TopologyBuilder#setBolt()</a></li>
<li>Note that as of Storm 0.8 the <code>parallelism_hint</code> parameter now specifies the initial number of executors (not tasks!) for that bolt.</li>
</ul></li>
</ul>
<h3 id="number-of-tasks">Number of tasks</h3>
<ul>
<li>Description: How many tasks to create <em>per component</em>.</li>
<li>Configuration option: <a href="/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_TASKS">TOPOLOGY_TASKS</a></li>
<li>How to set in your code (examples):
<ul>
<li><a href="/javadoc/apidocs/backtype/storm/topology/ComponentConfigurationDeclarer.html">ComponentConfigurationDeclarer#setNumTasks()</a></li>
</ul></li>
</ul>
<p>Here is an example code snippet to show these settings in practice:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">topologyBuilder</span><span class="o">.</span><span class="na">setBolt</span><span class="o">(</span><span class="s">"green-bolt"</span><span class="o">,</span> <span class="k">new</span> <span class="n">GreenBolt</span><span class="o">(),</span> <span class="mi">2</span><span class="o">)</span>
<span class="o">.</span><span class="na">setNumTasks</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span>
<span class="o">.</span><span class="na">shuffleGrouping</span><span class="o">(</span><span class="s">"blue-spout"</span><span class="o">);</span>
</code></pre></div>
<p>In the above code we configured Storm to run the bolt <code>GreenBolt</code> with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.</p>
<h2 id="example-of-a-running-topology">Example of a running topology</h2>
<p>The following illustration shows how a simple topology would look like in operation. The topology consists of three components: one spout called <code>BlueSpout</code> and two bolts called <code>GreenBolt</code> and <code>YellowBolt</code>. The components are linked such that <code>BlueSpout</code> sends its output to <code>GreenBolt</code>, which in turns sends its own output to <code>YellowBolt</code>.</p>
<p><img src="images/example-of-a-running-topology.png" alt="Example of a running topology in Storm"></p>
<p>The <code>GreenBolt</code> was configured as per the code snippet above whereas <code>BlueSpout</code> and <code>YellowBolt</code> only set the parallelism hint (number of executors). Here is the relevant code:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Config</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Config</span><span class="o">();</span>
<span class="n">conf</span><span class="o">.</span><span class="na">setNumWorkers</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span> <span class="c1">// use two worker processes</span>
<span class="n">topologyBuilder</span><span class="o">.</span><span class="na">setSpout</span><span class="o">(</span><span class="s">"blue-spout"</span><span class="o">,</span> <span class="k">new</span> <span class="n">BlueSpout</span><span class="o">(),</span> <span class="mi">2</span><span class="o">);</span> <span class="c1">// set parallelism hint to 2</span>
<span class="n">topologyBuilder</span><span class="o">.</span><span class="na">setBolt</span><span class="o">(</span><span class="s">"green-bolt"</span><span class="o">,</span> <span class="k">new</span> <span class="n">GreenBolt</span><span class="o">(),</span> <span class="mi">2</span><span class="o">)</span>
<span class="o">.</span><span class="na">setNumTasks</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span>
<span class="o">.</span><span class="na">shuffleGrouping</span><span class="o">(</span><span class="s">"blue-spout"</span><span class="o">);</span>
<span class="n">topologyBuilder</span><span class="o">.</span><span class="na">setBolt</span><span class="o">(</span><span class="s">"yellow-bolt"</span><span class="o">,</span> <span class="k">new</span> <span class="n">YellowBolt</span><span class="o">(),</span> <span class="mi">6</span><span class="o">)</span>
<span class="o">.</span><span class="na">shuffleGrouping</span><span class="o">(</span><span class="s">"green-bolt"</span><span class="o">);</span>
<span class="n">StormSubmitter</span><span class="o">.</span><span class="na">submitTopology</span><span class="o">(</span>
<span class="s">"mytopology"</span><span class="o">,</span>
<span class="n">conf</span><span class="o">,</span>
<span class="n">topologyBuilder</span><span class="o">.</span><span class="na">createTopology</span><span class="o">()</span>
<span class="o">);</span>
</code></pre></div>
<p>And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:</p>
<ul>
<li><a href="/javadoc/apidocs/backtype/storm/Config.html#TOPOLOGY_MAX_TASK_PARALLELISM">TOPOLOGY_MAX_TASK_PARALLELISM</a>: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. <a href="/javadoc/apidocs/backtype/storm/Config.html#setMaxTaskParallelism(int)">Config#setMaxTaskParallelism()</a>.</li>
</ul>
<h2 id="how-to-change-the-parallelism-of-a-running-topology">How to change the parallelism of a running topology</h2>
<p>A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.</p>
<p>You have two options to rebalance a topology:</p>
<ol>
<li>Use the Storm web UI to rebalance the topology.</li>
<li>Use the CLI tool storm rebalance as described below.</li>
</ol>
<p>Here is an example of using the CLI tool:</p>
<div class="highlight"><pre><code class="language-" data-lang="">## Reconfigure the topology "mytopology" to use 5 worker processes,
## the spout "blue-spout" to use 3 executors and
## the bolt "yellow-bolt" to use 10 executors.
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
</code></pre></div>
<h2 id="references">References</h2>
<ul>
<li><a href="Concepts.html">Concepts</a></li>
<li><a href="Configuration.html">Configuration</a></li>
<li><a href="Running-topologies-on-a-production-cluster.html">Running topologies on a production cluster</a></li>
<li><a href="Local-mode.html">Local mode</a></li>
<li><a href="/tutorial.html">Tutorial</a></li>
<li><a href="/javadoc/apidocs/">Storm API documentation</a>, most notably the class <code>Config</code></li>
</ul>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Storm</h5>
<p>Storm integrates with any queueing system and any database system. Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/documentation/Rationale.html">Rationale</a></li>
<li><a href="/tutorial.html">Tutorial</a></li>
<li><a href="/documentation/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/documentation/Creating-a-new-Storm-project.html">Creating a new Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/doc-index.html">Index</a></li>
<li><a href="/documentation.html">Manual</a></li>
<li><a href="https://storm.apache.org/javadoc/apidocs/index.html">Javadoc</a></li>
<li><a href="/documentation/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2015 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>