content/releases/current/Performance.html - storm-site - Git at Google

 <!DOCTYPE html>
 <html>
     <head>
     <meta charset="utf-8">
     <meta http-equiv="X-UA-Compatible" content="IE=edge">
     <meta name="viewport" content="width=device-width, initial-scale=1">

     <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
     <link rel="icon" href="/favicon.ico" type="image/x-icon">

     <title>Performance Tuning</title>

     <!-- Bootstrap core CSS -->
     <link href="/assets/css/bootstrap.min.css" rel="stylesheet">
     <!-- Bootstrap theme -->
     <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">

     <!-- Custom styles for this template -->
     <link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
     <link href="/css/style.css" rel="stylesheet">
     <link href="/assets/css/owl.theme.css" rel="stylesheet">
     <link href="/assets/css/owl.carousel.css" rel="stylesheet">
     <script type="text/javascript" src="/assets/js/jquery.min.js"></script>
     <script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
     <script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
     <script type="text/javascript" src="/assets/js/storm.js"></script>
     <!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
     <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->

     <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
     <!--[if lt IE 9]>
       <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
       <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
     <![endif]-->
   </head>


   <body>
     <header>
   <div class="container-fluid">
      <div class="row">
           <div class="col-md-5">
             <a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
           </div>
           <div class="col-md-5">

               <h1>Version: 2.3.0</h1>

           </div>
           <div class="col-md-2">
             <a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
           </div>
         </div>
     </div>
 </header>
 <!--Header End-->
 <!--Navigation Begin-->
 <div class="navbar" role="banner">
   <div class="container-fluid">
       <div class="navbar-header">
           <button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
             </button>
         </div>
         <nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
           <ul class="nav navbar-nav">
               <li><a href="/index.html" id="home">Home</a></li>
                 <li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
                 <li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
                 <li class="dropdown">
                     <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
                     <ul class="dropdown-menu">


                           <li><a href="/releases/2.3.0/index.html">2.3.0</a></li>


                           <li><a href="/releases/2.2.1/index.html">2.2.1</a></li>


                           <li><a href="/releases/2.2.0/index.html">2.2.0</a></li>


                           <li><a href="/releases/2.1.1/index.html">2.1.1</a></li>


                           <li><a href="/releases/2.1.0/index.html">2.1.0</a></li>


                           <li><a href="/releases/2.0.0/index.html">2.0.0</a></li>


                           <li><a href="/releases/1.2.4/index.html">1.2.4</a></li>


                           <li><a href="/releases/1.2.3/index.html">1.2.3</a></li>


                     </ul>
                 </li>
                 <li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
                 <li class="dropdown">
                     <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
                     <ul class="dropdown-menu">
                         <li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
                         <li><a href="/contribute/People.html">People</a></li>
                         <li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
                         <li><a href="/Powered-By.html">PoweredBy</a></li>
                     </ul>
                 </li>
                 <li><a href="/2021/10/14/storm211-released.html" id="news">News</a></li>
             </ul>
         </nav>
     </div>
 </div>


     <div class="container-fluid">
     <h1 class="page-title">Performance Tuning</h1>
           <div class="row">
            	<div class="col-md-12">
 	             <!-- Documentation -->

 <p class="post-meta"></p>

 <div class="documentation-content"><p>Latency, throughput and resource consumption are the three key dimensions involved in performance tuning.
 In the following sections we discuss the settings that can used to tune along these dimension and understand the trade-offs.</p>

 <p>It is important to understand that these settings can vary depending on the topology, the type of hardware and the number of hosts used by the topology.</p>

 <h2 id="1-buffer-size">1. Buffer Size</h2>

 <p>Spouts and Bolts operate asynchronously using message passing. Message queues used for this purpose are of fixed but configurable size. Buffer size
 refers to the size of these queues. Every consumer has its own receive queue. The messages wait in the queue until the consumer is ready to process them.
 The queue will typically be almost empty or almost full depending whether the consumer is operating faster or slower than the rate at which producers
 are generating messages for it. Storm queues always have a single consumer and potentially multiple producers. There are two buffer size settings
 of interest:</p>

 <ul>
 <li><code>topology.executor.receive.buffer.size</code> : This is the size of the message queue employed for each spout and bolt executor.</li>
 <li><code>topology.transfer.buffer.size</code> : This is the size of the outbound message queue used for inter-worker messaging. This queue is referred to as
 the <em>Worker Transfer Queue</em>.</li>
 </ul>

 <p><strong>Note:</strong> If the specified buffer size is not a power of 2, it is internally rounded up to the next power of 2.</p>

 <h4 id="guidance">Guidance</h4>

 <p>Very small message queues (size &lt; 1024) are likely to hamper throughput by not providing enough isolation between the consumer and producer. This
 can affect the asynchronous nature of the processing as the producers are likely to find the downstream queue to be full.</p>

 <p>Very large message queues are also not desirable to deal with slow consumers. Better to employ more consumers (i.e. bolts) on additional CPU cores instead. If queues
 are large and often full, the messages will end up waiting longer in these queues at each step of the processing, leading to poor latency being
 reported on the Storm UI. Large queues also imply higher memory consumption especially if the queues are typically full.</p>

 <h2 id="2-batch-size">2. Batch Size</h2>

 <p>Producers can either write a batch of messages to the consumer&#39;s queue or write each message individually. This batch size can be configured.
 Inserting messages in batches to downstream queues helps reduce the number of synchronization operations required for the inserts. Consequently this helps achieve higher throughput. However,
 sometimes it may take a little time for the buffer to fill up, before it is flushed into the downstream queue. This implies that the buffered messages
 will take longer to become visible to the downstream consumer who is waiting to process them. This can increase the average end-to-end latency for
 these messages. The latency can get very bad if the batch sizes are large and the topology is not experiencing high traffic.</p>

 <ul>
 <li><p><code>topology.producer.batch.size</code> : The batch size for writes into the receive queue of any spout/bolt is controlled via this setting. This setting
 impacts the communication within a worker process. Each upstream producer maintains a separate batch to a component&#39;s receive queue. So if two spout
 instances are writing to the same downstream bolt instance, each of the spout instances will have maintain a separate batch.</p></li>
 <li><p><code>topology.transfer.batch.size</code> : Messages that are destined to a spout/bolt running on a different worker process, are sent to a queue called
 the <strong>Worker Transfer Queue</strong>. The Worker Transfer Thread is responsible for draining the messages in this queue and send them to the appropriate
 worker process over the network. This setting controls the batch size for writes into the Worker Transfer Queue.  This impacts the communication
 between worker processes.</p></li>
 </ul>

 <h4 id="guidance">Guidance</h4>

 <p><strong>For Low latency:</strong> Set batch size to 1. This basically disables batching. This is likely to reduce peak sustainable throughput under heavy traffic, but
 not likely to impact throughput much under low/medium traffic situations.</p>

 <p><strong>For High throughput:</strong> Set batch size &gt; 1. Try values like 10, 100, 1000 or even higher and see what yields the best throughput for the topology.
 Beyond a certain point the throughput is likely to get worse.</p>

 <p><strong>Varying throughput:</strong> Topologies often experience fluctuating amounts of incoming traffic over the day. Other topos may experience higher traffic in some
 paths and lower throughput in other paths simultaneously. If latency is not a concern, a small bach size (e.g. 10) and in conjunction with the right flush
 frequency may provide a reasonable compromise for such scenarios. For meeting stricter latency SLAs, consider setting it to 1.</p>

 <h2 id="3-flush-tuple-frequency">3. Flush Tuple Frequency</h2>

 <p>In low/medium traffic situations or when batch size is too large, the batches may take too long to fill up and consequently the messages could take unacceptably
 long time to become visible to downstream components. In such case, periodic flushing of batches is necessary to keep the messages moving and avoid compromising
 latencies when batching is enabled.</p>

 <p>When batching has been enabled, special messages called <em>flush tuples</em> are inserted periodically into the receive queues of all spout and bolt instances.
 This causes each spout/bolt instance to flush all its outstanding batches to their respective downstream components.</p>

 <p><code>topology.flush.tuple.freq.millis</code> : This setting controls how often the flush tuples are generated. Flush tuples are not generated if this configuration is
 set to 0 or if (<code>topology.producer.batch.size</code>=1 and <code>topology.transfer.batch.size</code>=1).</p>

 <h4 id="guidance">Guidance</h4>

 <p>Flushing interval can be used as tool to retain the higher throughput benefits of batching and avoid batched messages getting stuck for too long waiting for their.
 batch to fill. Preferably this value should be larger than the average execute latencies of the bolts in the topology. Trying to flush the queues more frequently than
 the amount of time it takes to produce the messages may hurt performance. Understanding the average execute latencies of each bolt will help determine the average
 number of messages in the queues between two flushes.</p>

 <p><strong>For Low latency:</strong> A smaller value helps achieve tighter latency SLAs.</p>

 <p><strong>For High throughput:</strong>  When trying to maximize throughput under high traffic situations, the batches are likely to get filled and flushed automatically.
 To optimize for such cases, this value can be set to a higher number.</p>

 <p><strong>Varying throughput:</strong> If latency is not a concern, a larger value will optimize for high traffic situations. For meeting tighter SLAs set this to lower
 values.</p>

 <h2 id="4-wait-strategy">4. Wait Strategy</h2>

 <p>Wait strategies are used to conserve CPU usage by trading off some latency and throughput. They are applied for the following situations:</p>

 <p>4.1 <strong>Spout Wait:</strong>  In low/no traffic situations, Spout&#39;s nextTuple() may not produce any new emits. To prevent invoking the Spout&#39;s nextTuple too often,
 this wait strategy is used between nextTuple() calls, allowing the spout&#39;s executor thread to idle and conserve CPU. Spout wait strategy is also used
 when the <code>topology.max.spout.pending</code> limit has been reached when ACKers are enabled. Select a strategy using <code>topology.spout.wait.strategy</code>. Configure the
 chosen wait strategy using one of the <code>topology.spout.wait.*</code> settings.</p>

 <p>4.2 <strong>Bolt Wait:</strong> : When a bolt polls it&#39;s receive queue for new messages to process, it is possible that the queue is empty. This typically happens
 in case of low/no traffic situations or when the upstream spout/bolt is inherently slower. This wait strategy is used in such cases. It avoids high CPU usage
 due to the bolt continuously checking on a typically empty queue. Select a strategy using <code>topology.bolt.wait.strategy</code>. The chosen strategy can be further configured
 using the <code>topology.bolt.wait.*</code> settings.</p>

 <p>4.3 <strong>Backpressure Wait</strong> : Select a strategy using <code>topology.backpressure.wait.strategy</code>. When a spout/bolt tries to write to a downstream component&#39;s receive queue,
 there is a possibility that the queue is full. In such cases the write needs to be retried. This wait strategy is used to induce some idling in-between re-attempts for
 conserving CPU. The chosen strategy can be further configured using the <code>topology.backpressure.wait.*</code> settings.</p>

 <h4 id="built-in-wait-strategies">Built-in wait strategies:</h4>

 <p>These wait strategies are available for use with all of the above mentioned wait situations.</p>

 <ul>
 <li><strong>ProgressiveWaitStrategy</strong> : This strategy can be used for Bolt Wait or Backpressure Wait situations. Set the strategy to &#39;org.apache.storm.policy.WaitStrategyProgressive&#39; to
 select this wait strategy. This is a dynamic wait strategy that enters into progressively deeper states of CPU conservation if the Backpressure Wait or Bolt Wait situations persist.
 It has 3 levels of idling and allows configuring how long to stay at each level :</li>
 </ul>

 <ol>
 <li><p>Level1 / No Waiting - The first few times it will return immediately. This does not conserve any CPU. The number of times it remains in this state is configured using
 <code>topology.spout.wait.progressive.level1.count</code> or <code>topology.bolt.wait.progressive.level1.count</code> or <code>topology.backpressure.wait.progressive.level1.count</code> depending which
 situation it is being used.</p></li>
 <li><p>Level 2 / Park Nanos - In this state it disables the current thread for thread scheduling purposes, for 1 nano second using LockSupport.parkNanos(). This puts the CPU in a minimal
 conservation state. It remains in this state for <code>topology.spout.wait.progressive.level2.count</code> or <code>topology.bolt.wait.progressive.level2.count</code> or
 <code>topology.backpressure.wait.progressive.level2.count</code> iterations.</p></li>
 <li><p>Level 3 / Thread.sleep() - In this level it calls Thread.sleep() with the value specified in <code>topology.spout.wait.progressive.level3.sleep.millis</code> or
 <code>topology.bolt.wait.progressive.level3.sleep.millis</code> or <code>topology.backpressure.wait.progressive.level3.sleep.millis</code>. This is the most CPU conserving level and it remains in
 this level for the remaining iterations.</p></li>
 </ol>

 <ul>
 <li><strong>ParkWaitStrategy</strong> : This strategy can be used for Bolt Wait or Backpressure Wait situations. Set the strategy to <code>org.apache.storm.policy.WaitStrategyPark</code> to use this.
 This strategy disables the current thread for thread scheduling purposes by calling LockSupport.parkNanos(). The amount of park time is configured using either
 <code>topology.bolt.wait.park.microsec</code> or <code>topology.backpressure.wait.park.microsec</code> based on the wait situation it is used. Setting the park time to 0, effectively disables
 invocation of LockSupport.parkNanos and this mode can be used to achieve busy polling (which at the cost of high CPU utilization even when idle, may improve latency and/or throughput).</li>
 </ul>

 <h2 id="5-max-spout-pending">5. Max.spout.pending</h2>

 <p>The setting <code>topology.max.spout.pending</code> limits the number of un-ACKed tuples at the spout level. Once a spout reaches this limit, the spout&#39;s nextTuple()
 method will not be called until some ACKs are received for the outstanding emits. This setting does not have any affect if ACKing is disabled. It
 is a spout throttling mechanism which can impact throughput and latency. Setting it to null disables it for storm-core topologies. Impact on throughput
 is dependent on the topology and its concurrency (workers/executors), so experimentation is necessary to determine optimal setting. Latency and memory consumption
 is expected to typically increase with higher and higher values for this.</p>

 <h2 id="6-load-aware-messaging">6. Load Aware messaging</h2>

 <p>When load aware messaging is enabled (default), shuffle grouping takes additional factors into consideration for message routing.
 Impact of this on performance is dependent on the topology and its deployment footprint (i.e. distribution over process and machines).
 Consequently it is useful to assess the impact of setting <code>topology.disable.loadaware.messaging</code> to <code>true</code> or <code>false</code> for your
 specific case.</p>

 <h2 id="7-sampling-rate">7. Sampling Rate</h2>

 <p>Sampling rate is used to control how often certain metrics are computed on the Spout and Bolt executors. This is configured using <code>topology.stats.sample.rate</code>
 Setting it to 1 means, the stats are computed for every emitted message. As an example, to sample once every 1000 messages it can be set to  0.001. It may be
 possible to improve throughput and latency by reducing the sampling rate.</p>

 <h2 id="8-budgeting-cpu-cores-for-executors">8. Budgeting CPU cores for Executors</h2>

 <p>There are three main types of executors (i.e threads) to take into account when budgeting CPU cores for them. Spout Executors, Bolt Executors, Worker Transfer (handles outbound
 messages) and NettyWorker (handles inbound messages).
 The first two are used to run spout, bolt and acker instances. The Worker Transfer thread is used to serialize and send messages to other workers (in multi-worker mode).</p>

 <p>Executors that are expected to remain busy, either because they are handling a lot of messages, or because their processing is inherently CPU intensive, should be allocated
 1 physical core each. Allocating logical cores (instead of physical) or less than 1 physical core for CPU intensive executors increases CPU contention and performance can suffer.
 Executors that are not expected to be busy can be allocated a smaller fraction of the physical core (or even logical cores). It maybe not be economical to allocate a full physical
 core for executors that are not likely to saturate the CPU.</p>

 <p>The <em>system bolt</em> generally processes very few messages per second, and so requires very little cpu (typically less than 10% of a physical core).</p>

 <h2 id="9-garbage-collection">9. Garbage Collection</h2>

 <p>Choice of GC is an important concern for topologies that are latency or throughput sensitive. It is recommended to try the both the CMS and G1 collectors. Performance characteristics
 of the collectors can change between single and multiworker modes and is dependent on hardware characteristics such as number of CPUs and memory localities. Number of GC threads can
 also affect performance. Sometimes fewer GC threads can yield better performance. It is advisable to select a collector and tune it by mimicking anticipated peak data rates on hardware
 similar to what is used in production.</p>

 <h2 id="10-scaling-out-with-single-worker-mode">10. Scaling out with Single Worker mode</h2>

 <p>Communication between executors within a worker process is very fast as there is neither a need to serialize and deserialize messages nor does it involve communicating over the network
 stack. In multiworker mode, messages often cross worker process boundaries. For performance sensitive cases, if it is possible to configure a topology to run as many single-worker
 instances (for ex. one worker per input partition) rather than one multiworker instance, it may yield significantly better throughput and latency on the same hardware.
 The downside to this approach is that it adds the overhead of monitoring and managing many instances rather than one multiworker instance.</p>
 </div>


 	          </div>
 	       </div>
 	  </div>
 <footer>
     <div class="container-fluid">
         <div class="row">
             <div class="col-md-3">
                 <div class="footer-widget">
                     <h5>Meetups</h5>
                     <ul class="latest-news">

                         <li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>

                         <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>

                         <li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>

                         <li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>

                         <li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>

                         <li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>

                         <!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
                     </ul>
                 </div>
             </div>
             <div class="col-md-3">
                 <div class="footer-widget">
                     <h5>About Apache Storm</h5>
                     <p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
                </div>
             </div>
             <div class="col-md-3">
                 <div class="footer-widget">
                     <h5>First Look</h5>
                     <ul class="footer-list">
                         <li><a href="/releases/current/Rationale.html">Rationale</a></li>
                         <li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
                         <li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
                         <li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
                     </ul>
                 </div>
             </div>
             <div class="col-md-3">
                 <div class="footer-widget">
                     <h5>Documentation</h5>
                     <ul class="footer-list">
                         <li><a href="/releases/current/index.html">Index</a></li>
                         <li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
                         <li><a href="/releases/current/FAQ.html">FAQ</a></li>
                     </ul>
                 </div>
             </div>
         </div>
         <hr/>
         <div class="row">
             <div class="col-md-12">
                 <p align="center">Copyright © 2021 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
                     <br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
                     <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
             </div>
         </div>
     </div>
 </footer>
 <!--Footer End-->
 <!-- Scroll to top -->
 <span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>

 </body>

 </html>