| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> |
| <link rel="icon" href="/favicon.ico" type="image/x-icon"> |
| |
| <title>Performance Tuning</title> |
| |
| <!-- Bootstrap core CSS --> |
| <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> |
| <!-- Bootstrap theme --> |
| <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> |
| |
| <!-- Custom styles for this template --> |
| <link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css"> |
| <link href="/css/style.css" rel="stylesheet"> |
| <link href="/assets/css/owl.theme.css" rel="stylesheet"> |
| <link href="/assets/css/owl.carousel.css" rel="stylesheet"> |
| <script type="text/javascript" src="/assets/js/jquery.min.js"></script> |
| <script type="text/javascript" src="/assets/js/bootstrap.min.js"></script> |
| <script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script> |
| <script type="text/javascript" src="/assets/js/storm.js"></script> |
| <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> |
| <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| |
| |
| <body> |
| <header> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-5"> |
| <a href="/index.html"><img src="/images/logo.png" class="logo" /></a> |
| </div> |
| <div class="col-md-5"> |
| |
| <h1>Version: 2.3.0</h1> |
| |
| </div> |
| <div class="col-md-2"> |
| <a href="/downloads.html" class="btn-std btn-block btn-download">Download</a> |
| </div> |
| </div> |
| </div> |
| </header> |
| <!--Header End--> |
| <!--Navigation Begin--> |
| <div class="navbar" role="banner"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| </div> |
| <nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li><a href="/index.html" id="home">Home</a></li> |
| <li><a href="/getting-help.html" id="getting-help">Getting Help</a></li> |
| <li><a href="/about/integrates.html" id="project-info">Project Information</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| |
| |
| <li><a href="/releases/2.3.0/index.html">2.3.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.2.1/index.html">2.2.1</a></li> |
| |
| |
| |
| <li><a href="/releases/2.2.0/index.html">2.2.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.1.1/index.html">2.1.1</a></li> |
| |
| |
| |
| <li><a href="/releases/2.1.0/index.html">2.1.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.0.0/index.html">2.0.0</a></li> |
| |
| |
| |
| <li><a href="/releases/1.2.4/index.html">1.2.4</a></li> |
| |
| |
| |
| <li><a href="/releases/1.2.3/index.html">1.2.3</a></li> |
| |
| |
| </ul> |
| </li> |
| <li><a href="/talksAndVideos.html">Talks and Slideshows</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li> |
| <li><a href="/contribute/People.html">People</a></li> |
| <li><a href="/contribute/BYLAWS.html">ByLaws</a></li> |
| <li><a href="/Powered-By.html">PoweredBy</a></li> |
| </ul> |
| </li> |
| <li><a href="/2021/10/14/storm211-released.html" id="news">News</a></li> |
| </ul> |
| </nav> |
| </div> |
| </div> |
| |
| |
| |
| <div class="container-fluid"> |
| <h1 class="page-title">Performance Tuning</h1> |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- Documentation --> |
| |
| <p class="post-meta"></p> |
| |
| <div class="documentation-content"><p>Latency, throughput and resource consumption are the three key dimensions involved in performance tuning. |
| In the following sections we discuss the settings that can used to tune along these dimension and understand the trade-offs.</p> |
| |
| <p>It is important to understand that these settings can vary depending on the topology, the type of hardware and the number of hosts used by the topology.</p> |
| |
| <h2 id="1-buffer-size">1. Buffer Size</h2> |
| |
| <p>Spouts and Bolts operate asynchronously using message passing. Message queues used for this purpose are of fixed but configurable size. Buffer size |
| refers to the size of these queues. Every consumer has its own receive queue. The messages wait in the queue until the consumer is ready to process them. |
| The queue will typically be almost empty or almost full depending whether the consumer is operating faster or slower than the rate at which producers |
| are generating messages for it. Storm queues always have a single consumer and potentially multiple producers. There are two buffer size settings |
| of interest:</p> |
| |
| <ul> |
| <li><code>topology.executor.receive.buffer.size</code> : This is the size of the message queue employed for each spout and bolt executor.</li> |
| <li><code>topology.transfer.buffer.size</code> : This is the size of the outbound message queue used for inter-worker messaging. This queue is referred to as |
| the <em>Worker Transfer Queue</em>.</li> |
| </ul> |
| |
| <p><strong>Note:</strong> If the specified buffer size is not a power of 2, it is internally rounded up to the next power of 2.</p> |
| |
| <h4 id="guidance">Guidance</h4> |
| |
| <p>Very small message queues (size < 1024) are likely to hamper throughput by not providing enough isolation between the consumer and producer. This |
| can affect the asynchronous nature of the processing as the producers are likely to find the downstream queue to be full.</p> |
| |
| <p>Very large message queues are also not desirable to deal with slow consumers. Better to employ more consumers (i.e. bolts) on additional CPU cores instead. If queues |
| are large and often full, the messages will end up waiting longer in these queues at each step of the processing, leading to poor latency being |
| reported on the Storm UI. Large queues also imply higher memory consumption especially if the queues are typically full.</p> |
| |
| <h2 id="2-batch-size">2. Batch Size</h2> |
| |
| <p>Producers can either write a batch of messages to the consumer's queue or write each message individually. This batch size can be configured. |
| Inserting messages in batches to downstream queues helps reduce the number of synchronization operations required for the inserts. Consequently this helps achieve higher throughput. However, |
| sometimes it may take a little time for the buffer to fill up, before it is flushed into the downstream queue. This implies that the buffered messages |
| will take longer to become visible to the downstream consumer who is waiting to process them. This can increase the average end-to-end latency for |
| these messages. The latency can get very bad if the batch sizes are large and the topology is not experiencing high traffic.</p> |
| |
| <ul> |
| <li><p><code>topology.producer.batch.size</code> : The batch size for writes into the receive queue of any spout/bolt is controlled via this setting. This setting |
| impacts the communication within a worker process. Each upstream producer maintains a separate batch to a component's receive queue. So if two spout |
| instances are writing to the same downstream bolt instance, each of the spout instances will have maintain a separate batch.</p></li> |
| <li><p><code>topology.transfer.batch.size</code> : Messages that are destined to a spout/bolt running on a different worker process, are sent to a queue called |
| the <strong>Worker Transfer Queue</strong>. The Worker Transfer Thread is responsible for draining the messages in this queue and send them to the appropriate |
| worker process over the network. This setting controls the batch size for writes into the Worker Transfer Queue. This impacts the communication |
| between worker processes.</p></li> |
| </ul> |
| |
| <h4 id="guidance">Guidance</h4> |
| |
| <p><strong>For Low latency:</strong> Set batch size to 1. This basically disables batching. This is likely to reduce peak sustainable throughput under heavy traffic, but |
| not likely to impact throughput much under low/medium traffic situations.</p> |
| |
| <p><strong>For High throughput:</strong> Set batch size > 1. Try values like 10, 100, 1000 or even higher and see what yields the best throughput for the topology. |
| Beyond a certain point the throughput is likely to get worse.</p> |
| |
| <p><strong>Varying throughput:</strong> Topologies often experience fluctuating amounts of incoming traffic over the day. Other topos may experience higher traffic in some |
| paths and lower throughput in other paths simultaneously. If latency is not a concern, a small bach size (e.g. 10) and in conjunction with the right flush |
| frequency may provide a reasonable compromise for such scenarios. For meeting stricter latency SLAs, consider setting it to 1.</p> |
| |
| <h2 id="3-flush-tuple-frequency">3. Flush Tuple Frequency</h2> |
| |
| <p>In low/medium traffic situations or when batch size is too large, the batches may take too long to fill up and consequently the messages could take unacceptably |
| long time to become visible to downstream components. In such case, periodic flushing of batches is necessary to keep the messages moving and avoid compromising |
| latencies when batching is enabled.</p> |
| |
| <p>When batching has been enabled, special messages called <em>flush tuples</em> are inserted periodically into the receive queues of all spout and bolt instances. |
| This causes each spout/bolt instance to flush all its outstanding batches to their respective downstream components.</p> |
| |
| <p><code>topology.flush.tuple.freq.millis</code> : This setting controls how often the flush tuples are generated. Flush tuples are not generated if this configuration is |
| set to 0 or if (<code>topology.producer.batch.size</code>=1 and <code>topology.transfer.batch.size</code>=1).</p> |
| |
| <h4 id="guidance">Guidance</h4> |
| |
| <p>Flushing interval can be used as tool to retain the higher throughput benefits of batching and avoid batched messages getting stuck for too long waiting for their. |
| batch to fill. Preferably this value should be larger than the average execute latencies of the bolts in the topology. Trying to flush the queues more frequently than |
| the amount of time it takes to produce the messages may hurt performance. Understanding the average execute latencies of each bolt will help determine the average |
| number of messages in the queues between two flushes.</p> |
| |
| <p><strong>For Low latency:</strong> A smaller value helps achieve tighter latency SLAs.</p> |
| |
| <p><strong>For High throughput:</strong> When trying to maximize throughput under high traffic situations, the batches are likely to get filled and flushed automatically. |
| To optimize for such cases, this value can be set to a higher number.</p> |
| |
| <p><strong>Varying throughput:</strong> If latency is not a concern, a larger value will optimize for high traffic situations. For meeting tighter SLAs set this to lower |
| values.</p> |
| |
| <h2 id="4-wait-strategy">4. Wait Strategy</h2> |
| |
| <p>Wait strategies are used to conserve CPU usage by trading off some latency and throughput. They are applied for the following situations:</p> |
| |
| <p>4.1 <strong>Spout Wait:</strong> In low/no traffic situations, Spout's nextTuple() may not produce any new emits. To prevent invoking the Spout's nextTuple too often, |
| this wait strategy is used between nextTuple() calls, allowing the spout's executor thread to idle and conserve CPU. Spout wait strategy is also used |
| when the <code>topology.max.spout.pending</code> limit has been reached when ACKers are enabled. Select a strategy using <code>topology.spout.wait.strategy</code>. Configure the |
| chosen wait strategy using one of the <code>topology.spout.wait.*</code> settings.</p> |
| |
| <p>4.2 <strong>Bolt Wait:</strong> : When a bolt polls it's receive queue for new messages to process, it is possible that the queue is empty. This typically happens |
| in case of low/no traffic situations or when the upstream spout/bolt is inherently slower. This wait strategy is used in such cases. It avoids high CPU usage |
| due to the bolt continuously checking on a typically empty queue. Select a strategy using <code>topology.bolt.wait.strategy</code>. The chosen strategy can be further configured |
| using the <code>topology.bolt.wait.*</code> settings.</p> |
| |
| <p>4.3 <strong>Backpressure Wait</strong> : Select a strategy using <code>topology.backpressure.wait.strategy</code>. When a spout/bolt tries to write to a downstream component's receive queue, |
| there is a possibility that the queue is full. In such cases the write needs to be retried. This wait strategy is used to induce some idling in-between re-attempts for |
| conserving CPU. The chosen strategy can be further configured using the <code>topology.backpressure.wait.*</code> settings.</p> |
| |
| <h4 id="built-in-wait-strategies">Built-in wait strategies:</h4> |
| |
| <p>These wait strategies are available for use with all of the above mentioned wait situations.</p> |
| |
| <ul> |
| <li><strong>ProgressiveWaitStrategy</strong> : This strategy can be used for Bolt Wait or Backpressure Wait situations. Set the strategy to 'org.apache.storm.policy.WaitStrategyProgressive' to |
| select this wait strategy. This is a dynamic wait strategy that enters into progressively deeper states of CPU conservation if the Backpressure Wait or Bolt Wait situations persist. |
| It has 3 levels of idling and allows configuring how long to stay at each level :</li> |
| </ul> |
| |
| <ol> |
| <li><p>Level1 / No Waiting - The first few times it will return immediately. This does not conserve any CPU. The number of times it remains in this state is configured using |
| <code>topology.spout.wait.progressive.level1.count</code> or <code>topology.bolt.wait.progressive.level1.count</code> or <code>topology.backpressure.wait.progressive.level1.count</code> depending which |
| situation it is being used.</p></li> |
| <li><p>Level 2 / Park Nanos - In this state it disables the current thread for thread scheduling purposes, for 1 nano second using LockSupport.parkNanos(). This puts the CPU in a minimal |
| conservation state. It remains in this state for <code>topology.spout.wait.progressive.level2.count</code> or <code>topology.bolt.wait.progressive.level2.count</code> or |
| <code>topology.backpressure.wait.progressive.level2.count</code> iterations.</p></li> |
| <li><p>Level 3 / Thread.sleep() - In this level it calls Thread.sleep() with the value specified in <code>topology.spout.wait.progressive.level3.sleep.millis</code> or |
| <code>topology.bolt.wait.progressive.level3.sleep.millis</code> or <code>topology.backpressure.wait.progressive.level3.sleep.millis</code>. This is the most CPU conserving level and it remains in |
| this level for the remaining iterations.</p></li> |
| </ol> |
| |
| <ul> |
| <li><strong>ParkWaitStrategy</strong> : This strategy can be used for Bolt Wait or Backpressure Wait situations. Set the strategy to <code>org.apache.storm.policy.WaitStrategyPark</code> to use this. |
| This strategy disables the current thread for thread scheduling purposes by calling LockSupport.parkNanos(). The amount of park time is configured using either |
| <code>topology.bolt.wait.park.microsec</code> or <code>topology.backpressure.wait.park.microsec</code> based on the wait situation it is used. Setting the park time to 0, effectively disables |
| invocation of LockSupport.parkNanos and this mode can be used to achieve busy polling (which at the cost of high CPU utilization even when idle, may improve latency and/or throughput).</li> |
| </ul> |
| |
| <h2 id="5-max-spout-pending">5. Max.spout.pending</h2> |
| |
| <p>The setting <code>topology.max.spout.pending</code> limits the number of un-ACKed tuples at the spout level. Once a spout reaches this limit, the spout's nextTuple() |
| method will not be called until some ACKs are received for the outstanding emits. This setting does not have any affect if ACKing is disabled. It |
| is a spout throttling mechanism which can impact throughput and latency. Setting it to null disables it for storm-core topologies. Impact on throughput |
| is dependent on the topology and its concurrency (workers/executors), so experimentation is necessary to determine optimal setting. Latency and memory consumption |
| is expected to typically increase with higher and higher values for this.</p> |
| |
| <h2 id="6-load-aware-messaging">6. Load Aware messaging</h2> |
| |
| <p>When load aware messaging is enabled (default), shuffle grouping takes additional factors into consideration for message routing. |
| Impact of this on performance is dependent on the topology and its deployment footprint (i.e. distribution over process and machines). |
| Consequently it is useful to assess the impact of setting <code>topology.disable.loadaware.messaging</code> to <code>true</code> or <code>false</code> for your |
| specific case.</p> |
| |
| <h2 id="7-sampling-rate">7. Sampling Rate</h2> |
| |
| <p>Sampling rate is used to control how often certain metrics are computed on the Spout and Bolt executors. This is configured using <code>topology.stats.sample.rate</code> |
| Setting it to 1 means, the stats are computed for every emitted message. As an example, to sample once every 1000 messages it can be set to 0.001. It may be |
| possible to improve throughput and latency by reducing the sampling rate.</p> |
| |
| <h2 id="8-budgeting-cpu-cores-for-executors">8. Budgeting CPU cores for Executors</h2> |
| |
| <p>There are three main types of executors (i.e threads) to take into account when budgeting CPU cores for them. Spout Executors, Bolt Executors, Worker Transfer (handles outbound |
| messages) and NettyWorker (handles inbound messages). |
| The first two are used to run spout, bolt and acker instances. The Worker Transfer thread is used to serialize and send messages to other workers (in multi-worker mode).</p> |
| |
| <p>Executors that are expected to remain busy, either because they are handling a lot of messages, or because their processing is inherently CPU intensive, should be allocated |
| 1 physical core each. Allocating logical cores (instead of physical) or less than 1 physical core for CPU intensive executors increases CPU contention and performance can suffer. |
| Executors that are not expected to be busy can be allocated a smaller fraction of the physical core (or even logical cores). It maybe not be economical to allocate a full physical |
| core for executors that are not likely to saturate the CPU.</p> |
| |
| <p>The <em>system bolt</em> generally processes very few messages per second, and so requires very little cpu (typically less than 10% of a physical core).</p> |
| |
| <h2 id="9-garbage-collection">9. Garbage Collection</h2> |
| |
| <p>Choice of GC is an important concern for topologies that are latency or throughput sensitive. It is recommended to try the both the CMS and G1 collectors. Performance characteristics |
| of the collectors can change between single and multiworker modes and is dependent on hardware characteristics such as number of CPUs and memory localities. Number of GC threads can |
| also affect performance. Sometimes fewer GC threads can yield better performance. It is advisable to select a collector and tune it by mimicking anticipated peak data rates on hardware |
| similar to what is used in production.</p> |
| |
| <h2 id="10-scaling-out-with-single-worker-mode">10. Scaling out with Single Worker mode</h2> |
| |
| <p>Communication between executors within a worker process is very fast as there is neither a need to serialize and deserialize messages nor does it involve communicating over the network |
| stack. In multiworker mode, messages often cross worker process boundaries. For performance sensitive cases, if it is possible to configure a topology to run as many single-worker |
| instances (for ex. one worker per input partition) rather than one multiworker instance, it may yield significantly better throughput and latency on the same hardware. |
| The downside to this approach is that it adds the overhead of monitoring and managing many instances rather than one multiworker instance.</p> |
| </div> |
| |
| |
| </div> |
| </div> |
| </div> |
| <footer> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Meetups</h5> |
| <ul class="latest-news"> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li> |
| |
| <li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li> |
| |
| <li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li> |
| |
| <li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li> |
| |
| <!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> --> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>About Apache Storm</h5> |
| <p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>First Look</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/Rationale.html">Rationale</a></li> |
| <li><a href="/releases/current/Tutorial.html">Tutorial</a></li> |
| <li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li> |
| <li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Documentation</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/index.html">Index</a></li> |
| <li><a href="/releases/current/javadocs/index.html">Javadoc</a></li> |
| <li><a href="/releases/current/FAQ.html">FAQ</a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <hr/> |
| <div class="row"> |
| <div class="col-md-12"> |
| <p align="center">Copyright © 2021 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. |
| <br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation. |
| <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> |
| </div> |
| </div> |
| </div> |
| </footer> |
| <!--Footer End--> |
| <!-- Scroll to top --> |
| <span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span> |
| |
| </body> |
| |
| </html> |
| |