src/components/artemis/documentation/2.32.0/perf-tools.html - activemq-website - Git at Google

 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="UTF-8">
 <meta http-equiv="X-UA-Compatible" content="IE=edge">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <meta name="generator" content="Asciidoctor 2.0.18">
 <link rel="icon" type="image/png" href="images/favicon.png">
 <title>Performance Tools</title>
 <link rel="stylesheet" href="css/asciidoctor.css">
 <link rel="stylesheet" href="css/font-awesome.css">
 <link rel="stylesheet" href="css/rouge-github.css">
 </head>
 <body class="book toc2 toc-left">
 <div id="header">
 <h1>Performance Tools</h1>
 <div id="toc" class="toc2">
 <div id="toctitle"><a href="index.html">User Manual for 2.32.0</a></div>
 <ul class="sectlevel1">
 <li><a href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a>
 <ul class="sectlevel2">
 <li><a href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></li>
 <li><a href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></li>
 <li><a href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></li>
 </ul>
 </li>
 <li><a href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a>
 <ul class="sectlevel2">
 <li><a href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></li>
 </ul>
 </li>
 <li><a href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></li>
 </ul>
 </div>
 </div>
 <div id="content">
 <div id="preamble">
 <div class="sectionbody">
 <div class="paragraph">
 <p>Artemis provides some built-in performance test tools based on the <a href="https://javaee.github.io/jms-spec/pages/JMS20FinalRelease">JMS 2 API</a> to help users (and developers) to stress test a configured Artemis broker instance in different scenarios.</p>
 </div>
 <div class="paragraph">
 <p>These command-line tools won&#8217;t represent a full-fat benchmark (such as <a href="https://openmessaging.cloud/docs/benchmarks/">Open Messaging</a>), but can be used as building blocks to produce one.
 They are also quite useful on their own.</p>
 </div>
 <div class="paragraph">
 <p>In summary, the provided <code>perf</code> tools are:</p>
 </div>
 <div class="olist arabic">
 <ol class="arabic">
 <li>
 <p><code>producer</code> tool: it can generate both all-out throughput or target-rate load, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/bytesmessage">BytesMessage</a> of a configured size</p>
 </li>
 <li>
 <p><code>consumer</code> tool: it uses a <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messagelistener">MessageListener</a> to consume messages sent by the <code>producer</code> command</p>
 </li>
 <li>
 <p><code>client</code> tools: it packs both tools as a single command</p>
 </li>
 </ol>
 </div>
 <div class="paragraph">
 <p>Most users will just need the <code>client</code> tool, but the <code>producer</code> and <code>consumer</code> tools allow performing tests in additional scenario(s):</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
 <p>delaying consumer start, in order to cause the broker to page</p>
 </li>
 <li>
 <p>running producers and consumers on different machines</p>
 </li>
 <li>
 <p>&#8230;&#8203;</p>
 </li>
 </ul>
 </div>
 <div class="paragraph">
 <p>The examples below (running on a <code>64 bit Linux 5.14 with Intel® Core™ i7-9850H CPU @ 2.60GHz × 12 with Turbo Boost disabled, 32 GB of RAM and SSD</code>)  show different use cases of increasing complexity.
 As they progress, some internal architectural details of the tool and the configuration options supported, are explored.</p>
 </div>
 <div class="admonitionblock note">
 <table>
 <tr>
 <td class="icon">
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
 <div class="paragraph">
 <p>The tools can run both from within the broker&#8217;s instance folder or from the home folder.
 In the former case it will use the same JVM parameter configured on the instance (on <code>artemis.profile</code>), while in the latter case the user should set <code>JAVA_ARGS</code> environment variable to override default heap and GC parameters (e.g. <code>-XX:+UseParallelGC -Xms512M -Xmx1024M</code>)</p>
 </div>
 </td>
 </tr>
 </table>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="case-1-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-1-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a></h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>This is the simplest possible case: running a load test with 1 producer and 1 consumer on a non-durable queue <code>TEST_QUEUE</code>,  using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/deliverymode#NON_PERSISTENT">non-persistent</a>  1024 bytes long (by default) messages, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#AUTO_ACKNOWLEDGE">auto-acknowledge</a>.</p>
 </div>
 <div class="paragraph">
 <p>Let&#8217;s see what happens after typing:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client queue://TEST_QUEUE
 Connection brokerURL <span class="o">=</span> tcp://localhost:61616
 2022-01-18 10:30:54,535 WARN  <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server!
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:          7316 msg/sec
 <span class="nt">---</span> blocked:       6632 msg/sec
 <span class="nt">---</span> completed:     7320 msg/sec
 <span class="nt">---</span> received:      7317 msg/sec
 <span class="c"># ...</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>The test keeps on running, until <code>SIGTERM</code> or <code>SIGINT</code> signals are sent to the Java process (on Linux Console it translates into pressing <strong>CTRL + C</strong>).
 Before looking what the metrics mean, there&#8217;s an initial <code>WARN</code> log that shouldn&#8217;t be ignored:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash">WARN  <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server!</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>It shows two things:</p>
 </div>
 <div class="olist arabic">
 <ol class="arabic">
 <li>
 <p>the load generator uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">async message producers</a></p>
 </li>
 <li>
 <p><code>confirmationWindowSize</code> is an Artemis CORE protocol specific setting;
 the <code>perf</code> commands uses CORE as the default JMS provider</p>
 </li>
 </ol>
 </div>
 <div class="sect2">
 <h3 id="live-latency-console-reporting"><a class="anchor" href="#live-latency-console-reporting"></a><a class="link" href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></h3>
 <div class="paragraph">
 <p>The <code>perf client</code> command can report on Console different latency percentiles metrics by adding <code>--show-latency</code> to the command arguments, but in order to obtain meaningful metrics, we need to address <code>WARN</code> by setting <code>confirmationWindowSize</code> on the producer <code>url</code>,  setting <code>--consumer-url</code> to save applying the same configuration for consumer(s).</p>
 </div>
 <div class="paragraph">
 <p>In short, the command is using these additional parameters:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616</code></pre>
 </div>
 </div>
 <div class="sect3">
 <h4 id="running-it"><a class="anchor" href="#running-it"></a><a class="link" href="#running-it">1.1.1. Running it</a></h4>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:          8114 msg/sec
 <span class="nt">---</span> blocked:       8114 msg/sec
 <span class="nt">---</span> completed:     8114 msg/sec
 <span class="nt">---</span> received:      8113 msg/sec
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:    113.01 us - 50.00%:    106.00 us - 90.00%:    142.00 us - 99.00%:    204.00 us - 99.90%:    371.00 us - 99.99%:   3455.00 us - max:      3455.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:    213.71 us - 50.00%:    126.00 us - 90.00%:    177.00 us - 99.00%:   3439.00 us - 99.90%:   7967.00 us - 99.99%:   8895.00 us - max:      8895.00 us
 <span class="c"># CTRL + C pressed</span>
 <span class="nt">---</span> SUMMARY
 <span class="nt">---</span> result:              success
 <span class="nt">---</span> total sent:            70194
 <span class="nt">---</span> total blocked:         70194
 <span class="nt">---</span> total completed:       70194
 <span class="nt">---</span> total received:        70194
 <span class="nt">---</span> aggregated send <span class="nb">time</span>:       mean:    101.53 us - 50.00%:     86.00 us - 90.00%:    140.00 us - 99.00%:    283.00 us - 99.90%:    591.00 us - 99.99%:   2007.00 us - max:     24959.00 us
 <span class="nt">---</span> aggregated transfer <span class="nb">time</span>:   mean:    127.48 us - 50.00%:     97.00 us - 90.00%:    166.00 us - 99.00%:    449.00 us - 99.90%:   4671.00 us - 99.99%:   8255.00 us - max:     27263.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Some notes:</p>
 </div>
 <div class="olist arabic">
 <ol class="arabic">
 <li>
 <p><code>WARN</code> message is now gone</p>
 </li>
 <li>
 <p><code>send ack time</code> and <code>transfer time</code> statistics are printed at second interval</p>
 </li>
 <li>
 <p><code>total</code> and <code>aggregated</code> metrics are printed on test completion (more on this later)</p>
 </li>
 </ol>
 </div>
 <div class="paragraph">
 <p>The meaning of the live latency statistics are:</p>
 </div>
 <div class="dlist">
 <dl>
 <dt class="hdlist1">send ack time</dt>
 <dd>
 <p>percentiles of latency to acknowledge sent messages</p>
 </dd>
 <dt class="hdlist1">transfer time</dt>
 <dd>
 <p>percentiles of latency to transfer messages from producer(s) to consumer(s)</p>
 </dd>
 </dl>
 </div>
 <div class="paragraph">
 <p>The <code>perf</code> commands uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">JMS 2 async message producers</a> that allow the load generator to accumulate in-flight sent messages and depending on the protocol implementation, may block its producer thread due to producer flow control.
 e.g: the Artemis CORE protocol can block producers threads to refill producers credits, while the <a href="https://qpid.apache.org/components/jms/index.html">QPID-JMS</a> won&#8217;t.</p>
 </div>
 <div class="paragraph">
 <p>The <code>perf</code> tool is implementing its own in-flight sent requests tracking and can be configured to limit the amount of pending sent messages,  while reporting the rate by which producers are "blocked" awaiting completions</p>
 </div>
 <div class="admonitionblock note">
 <table>
 <tr>
 <td class="icon">
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
 <div class="title">Producer threads are <code>blocked</code>?</div>
 <div class="paragraph">
 <p>Although the load back-pressure mechanism is non-blocking, given that the load generator cannot push further load while back-pressured by the protocol client, the load is semantically "blocked".
 This detail is relevant to explain the live rate <a href="#running-it">statistics</a> on Console:</p>
 </div>
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
 <p>By default, the <code>perf</code> tools (i.e: <code>client</code> and <code>producer</code>) <strong>limits the number of in-flight request to 1</strong>: to change the default setting users should add <code>--max-pending</code> parameter configuration.</p>
 </div>
 <div class="admonitionblock warning">
 <table>
 <tr>
 <td class="icon">
 <i class="fa icon-warning" title="Warning"></i>
 </td>
 <td class="content">
 <div class="paragraph">
 <p>Setting <code>--max-pending 0</code> will disable the load generator in-flight sent messages limiter, allowing the tool to accumulate an unbounded number of in-flight messages, risking <code>OutOfMemoryError</code>.</p>
 </div>
 <div class="paragraph">
 <p>This is <strong>NOT RECOMMENDED!</strong></p>
 </div>
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
 <p>More detail on the metrics:</p>
 </div>
 <div class="dlist">
 <dl>
 <dt class="hdlist1">warmup</dt>
 <dd>
 <p>The generator phase while the statistics sample is collected; warmup duration can be set by setting <code>--warmup</code></p>
 </dd>
 <dt class="hdlist1">sent</dt>
 <dd>
 <p>The message sent rate</p>
 </dd>
 <dt class="hdlist1">blocked</dt>
 <dd>
 <p>The rate of attempts to send a new message, "blocked" awaiting <code>--max-pending</code> refill</p>
 </dd>
 <dt class="hdlist1">completed</dt>
 <dd>
 <p>The rate of message send acknowledgements received by producer(s)</p>
 </dd>
 <dt class="hdlist1">received</dt>
 <dd>
 <p>The rate of messages received by consumer(s)</p>
 </dd>
 </dl>
 </div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="how-to-read-the-live-statistics"><a class="anchor" href="#how-to-read-the-live-statistics"></a><a class="link" href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></h3>
 <div class="paragraph">
 <p>+ The huge amount of <code>blocked</code> vs <code>sent</code> means that the broker wasn&#8217;t fast enough to refill the single <code>--max-pending</code> budget before sending a new message.
 + It can be changed into:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--max-pending</span> 100</code></pre>
 </div>
 </div>
 <div id="our-previous-command" class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
 Connection brokerURL <span class="o">=</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000
 <span class="c"># first samples shows very BAD performance because client JVM is still warming up</span>
 <span class="nt">---</span> warmup <span class="nb">true</span>
 <span class="nt">---</span> sent:         27366 msg/sec
 <span class="nt">---</span> blocked:        361 msg/sec
 <span class="nt">---</span> completed:    27305 msg/sec
 <span class="nt">---</span> received:     26195 msg/sec
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:   1743.39 us - 50.00%:   1551.00 us - 90.00%:   3119.00 us - 99.00%:   5215.00 us - 99.90%:   8575.00 us - 99.99%:   8703.00 us - max:     23679.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:  11860.32 us - 50.00%:  11583.00 us - 90.00%:  18559.00 us - 99.00%:  24319.00 us - 99.90%:  31359.00 us - 99.99%:  31615.00 us - max:     31615.00 us
 <span class="c"># ... &gt; 20 seconds later ...</span>
 <span class="c"># performance is now way better then during warmup</span>
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:         86525 msg/sec
 <span class="nt">---</span> blocked:       5734 msg/sec
 <span class="nt">---</span> completed:    86525 msg/sec
 <span class="nt">---</span> received:     86556 msg/sec
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:   1109.13 us - 50.00%:   1103.00 us - 90.00%:   1447.00 us - 99.00%:   1687.00 us - 99.90%:   5791.00 us - 99.99%:   5983.00 us - max:      5983.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:   4662.94 us - 50.00%:   1679.00 us - 90.00%:  12159.00 us - 99.00%:  14079.00 us - 99.90%:  14527.00 us - 99.99%:  14783.00 us - max:     14783.00 us
 <span class="c"># CTRL + C</span>
 <span class="nt">---</span> SUMMARY
 <span class="nt">---</span> result:              success
 <span class="nt">---</span> total sent:          3450389
 <span class="nt">---</span> total blocked:        168863
 <span class="nt">---</span> total completed:     3450389
 <span class="nt">---</span> total received:      3450389
 <span class="nt">---</span> aggregated send <span class="nb">time</span>:       mean:   1056.09 us - 50.00%:   1003.00 us - 90.00%:   1423.00 us - 99.00%:   1639.00 us - 99.90%:   4287.00 us - 99.99%:   7103.00 us - max:     19583.00 us
 <span class="nt">---</span> aggregated transfer <span class="nb">time</span>:   mean:  18647.51 us - 50.00%:  10751.00 us - 90.00%:  54271.00 us - 99.00%:  84991.00 us - 99.90%:  90111.00 us - 99.99%:  93183.00 us - max:     94207.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Some notes on the results:</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
 <p>we now have a reasonable <code>blocked/sent</code> ratio (&lt; ~10%)</p>
 </li>
 <li>
 <p>sent rate has improved <strong>ten-fold</strong> if compared to <a href="#running-it">previous results</a></p>
 </li>
 </ul>
 </div>
 <div class="paragraph">
 <p>And on the <code>SUMMARY</code> statistics:</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
 <p><code>total</code> counters include measurements collected with <code>warmup true</code></p>
 </li>
 <li>
 <p><code>aggregated</code> latencies <strong>don&#8217;t</strong> include measurements collected with <code>warmup true</code></p>
 </li>
 </ul>
 </div>
 </div>
 <div class="sect2">
 <h3 id="how-to-compare-latencies-across-tests"><a class="anchor" href="#how-to-compare-latencies-across-tests"></a><a class="link" href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></h3>
 <div class="paragraph">
 <p>The Console output format isn&#8217;t designed for easy latency comparisons, however the <code>perf</code> commands expose <code>--hdr &lt;hdr file name&gt;</code> parameter to produce a <a href="http://hdrhistogram.org/">HDR Histogram</a> compatible report that can be opened with different visualizers + eg <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a>, <a href="https://github.com/ennerf/HdrHistogramVisualizer">HdrHistogramVisualizer</a> or <a href="https://github.com/HdrHistogram/HistogramLogAnalyzer">HistogramLogAnalyzer</a>.</p>
 </div>
 <div class="admonitionblock note">
 <table>
 <tr>
 <td class="icon">
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
 Any latency collected trace on this guide is going to use <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a> as HDR Histogram visualizer tool.
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
 <p>Below is the visualization of the HDR histograms collected while adding to the previous benchmark</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--hdr</span> /tmp/non_durable_queue.hdr</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Whole test execution shows tagged latencies, to distinguish <code>warmup</code> ones:</p>
 </div>
 <div class="imageblock">
 <div class="content">
 <img src="images/test.png" alt="test">
 </div>
 </div>
 <div class="paragraph">
 <p>Filtering out <code>warmup</code> latencies, it looks like</p>
 </div>
 <div class="imageblock">
 <div class="content">
 <img src="images/hot_test.png" alt="hot test">
 </div>
 </div>
 <div class="paragraph">
 <p>Latency results shows that at higher percentiles <code>transfer</code> latency is way higher than the <code>sent</code> one (reminder: <code>sent</code> it&#8217;s the time to acknowledge sent messages), probably meaning that some queuing-up is happening on the broker.</p>
 </div>
 <div class="paragraph">
 <p>In order to test this theory we switch to <strong>target rate tests</strong>.</p>
 </div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="case-2-target-rate-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a></h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><code>perf client</code> and <code>perf producer</code> tools allow specifying a target rate to schedule producer(s) requests: adding</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> &lt;msg/sec integer value&gt;</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>The previous example <a href="#our-previous-command">last run</a> shows that <code>--max-pending 100</code> guarantees &lt; 10% blocked/sent messages with  aggregated latencies</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">---</span> aggregated send <span class="nb">time</span>:       mean:   1056.09 us - 50.00%:   1003.00 us - 90.00%:   1423.00 us - 99.00%:   1639.00 us - 99.90%:   4287.00 us - 99.99%:   7103.00 us - max:     19583.00 us
 <span class="nt">---</span> aggregated transfer <span class="nb">time</span>:   mean:  18647.51 us - 50.00%:  10751.00 us - 90.00%:  54271.00 us - 99.00%:  84991.00 us - 99.90%:  90111.00 us - 99.99%:  93183.00 us - max:     94207.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>We would like to lower <code>transfer time</code> sub-millisecond;
 let&#8217;s try by running a load test with ~30% of the max perceived sent rate, by setting:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>The whole command is then:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
 <span class="c"># ... after 20 warmup seconds ...</span>
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:         30302 msg/sec
 <span class="nt">---</span> blocked:          0 msg/sec
 <span class="nt">---</span> completed:    30302 msg/sec
 <span class="nt">---</span> received:     30303 msg/sec
 <span class="nt">---</span> send delay <span class="nb">time</span>: mean:     24.20 us - 50.00%:     21.00 us - 90.00%:     54.00 us - 99.00%:     72.00 us - 99.90%:    233.00 us - 99.99%:    659.00 us - max:       731.00 us
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:    150.48 us - 50.00%:    120.00 us - 90.00%:    172.00 us - 99.00%:   1223.00 us - 99.90%:   2543.00 us - 99.99%:   3183.00 us - max:      3247.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:    171.53 us - 50.00%:    135.00 us - 90.00%:    194.00 us - 99.00%:   1407.00 us - 99.90%:   2607.00 us - 99.99%:   3151.00 us - max:      3183.00 us
 <span class="c"># CTRL + C</span>
 <span class="nt">---</span> SUMMARY
 <span class="nt">---</span> result:              success
 <span class="nt">---</span> total sent:          1216053
 <span class="nt">---</span> total blocked:           845
 <span class="nt">---</span> total completed:     1216053
 <span class="nt">---</span> total received:      1216053
 <span class="nt">---</span> aggregated delay send <span class="nb">time</span>: mean:     35.84 us - 50.00%:     20.00 us - 90.00%:     55.00 us - 99.00%:    116.00 us - 99.90%:   3359.00 us - 99.99%:   5503.00 us - max:      6495.00 us
 <span class="nt">---</span> aggregated send <span class="nb">time</span>:       mean:    147.38 us - 50.00%:    117.00 us - 90.00%:    165.00 us - 99.00%:    991.00 us - 99.90%:   4191.00 us - 99.99%:   5695.00 us - max:      7103.00 us
 <span class="nt">---</span> aggregated transfer <span class="nb">time</span>:   mean:    178.48 us - 50.00%:    134.00 us - 90.00%:    188.00 us - 99.00%:   1359.00 us - 99.90%:   5471.00 us - 99.99%:   8831.00 us - max:     12799.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>We&#8217;ve now achieved sub-millisecond <code>transfer</code> latencies until <code>90.00 pencentile</code>.
 + Opening <code>/tmp/30K.hdr</code> makes easier to see it:</p>
 </div>
 <div class="imageblock">
 <div class="content">
 <img src="images/30K.png" alt="test">
 </div>
 </div>
 <div class="paragraph">
 <p>Now <code>send</code> and <code>transfer</code> time looks quite similar and there&#8217;s no sign of queueing, but&#8230;&#8203;</p>
 </div>
 <div class="sect2">
 <h3 id="what-delay-send-time-means"><a class="anchor" href="#what-delay-send-time-means"></a><a class="link" href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></h3>
 <div class="paragraph">
 <p>This metric is borrowed from the <a href="http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html">Coordinated Omission</a> concept, and it measures the delay of producer(s) while trying to send messages at the requested rate.</p>
 </div>
 <div class="paragraph">
 <p>The source of such delay could be:</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
 <p>slow responding broker: the load generator reached <code>--max-pending</code> and the expected rate cannot be honored</p>
 </li>
 <li>
 <p>client running out of resources (lack of CPU time, GC pauses, etc etc): load generator cannot keep-up with the expected rate because it is just "too fast" for it</p>
 </li>
 <li>
 <p>protocol-dependent blocking behaviours: CORE JMS 2 async send can block due to <code>producerWindowSize</code> exhaustion</p>
 </li>
 </ul>
 </div>
 <div class="paragraph">
 <p>A sane run of a target rate test should keep <code>delay send time</code> under control or investigation actions must be taken to understand what&#8217;s the source of the delay.
 + Let&#8217;s show it with an example: we&#8217;ve already checked the all-out rate of the broker ie ~90K msg/sec</p>
 </div>
 <div class="paragraph">
 <p>By running a <code>--rate 90000</code> test under the same conditions, latencies will look as</p>
 </div>
 <div class="imageblock">
 <div class="content">
 <img src="images/90K.png" alt="test">
 </div>
 </div>
 <div class="paragraph">
 <p>It clearly shows that the load generator is getting delayed and cannot keep-up with the expected rate.</p>
 </div>
 <div class="paragraph">
 <p>Below is a more complex example involving destinations (auto)generation with "asymmetric" load i.e: the producer number is different from consumer number.</p>
 </div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"><a class="anchor" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"></a><a class="link" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The <code>perf</code> tool can auto generate destinations using</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> &lt;number of destinations to generate&gt;</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>and naming them by using the destination name specified as the seed and an ordered sequence suffix.</p>
 </div>
 <div class="paragraph">
 <p>eg</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> 3 topic://TOPIC</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>would generate 3 topics: <code>TOPIC0</code>, <code>TOPIC1</code>, <code>TOPIC2</code>.</p>
 </div>
 <div class="paragraph">
 <p>With the default configuration (without specifying <code>--num-destinations</code>) it would just create <code>TOPIC</code>, without any numerical suffix.</p>
 </div>
 <div class="paragraph">
 <p>In order to create a load generation on 10 topics, <strong>each</strong> with 3 producers and 2 unshared consumers:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 topic://TOPIC</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>The whole <code>perf client</code> all-out throughput command would be:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="c"># same as in the previous cases</span>
 ./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--s</span>
 how-latency <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
 <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC
 <span class="c"># this last part above is new</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>and it would print&#8230;&#8203;</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash">javax.jms.IllegalStateException: Cannot create durable subscription - client ID has not been <span class="nb">set</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Given that the generator is creating <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions</a>, is it mandatory to set a ClientID for each connection used.</p>
 </div>
 <div class="paragraph">
 <p>The <code>perf client</code> tool creates a connection for each consumer by default and auto-generates both ClientIDs and subscriptions names (as required by the <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions API</a>).
 ClientID still requires users to specify Client ID prefixes with <code>--clientID &lt;Client ID prefix&gt;</code> and takes care to unsubscribe the consumers on test completion.</p>
 </div>
 <div class="paragraph">
 <p>The complete commands now looks like:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
 <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id
 <span class="c"># after few seconds</span>
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:         74842 msg/sec
 <span class="nt">---</span> blocked:       2702 msg/sec
 <span class="nt">---</span> completed:    74641 msg/sec
 <span class="nt">---</span> received:    146412 msg/sec
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:  37366.13 us - 50.00%:  37119.00 us - 90.00%:  46079.00 us - 99.00%:  68095.00 us - 99.90%:  84479.00 us - 99.99%:  94719.00 us - max:     95743.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:  44060.66 us - 50.00%:  43263.00 us - 90.00%:  54527.00 us - 99.00%:  75775.00 us - 99.90%:  87551.00 us - 99.99%:  91135.00 us - max:     91135.00 us
 <span class="c"># CTRL + C</span>
 <span class="nt">---</span> SUMMARY
 <span class="nt">---</span> result:              success
 <span class="nt">---</span> total sent:          2377653
 <span class="nt">---</span> total blocked:         80004
 <span class="nt">---</span> total completed:     2377653
 <span class="nt">---</span> total received:      4755306
 <span class="nt">---</span> aggregated send <span class="nb">time</span>:       mean:  39423.69 us - 50.00%:  38911.00 us - 90.00%:  49663.00 us - 99.00%:  66047.00 us - 99.90%:  85503.00 us - 99.99%: 101887.00 us - max:    115711.00 us
 <span class="nt">---</span> aggregated transfer <span class="nb">time</span>:   mean:  46216.99 us - 50.00%:  45311.00 us - 90.00%:  57855.00 us - 99.00%:  78335.00 us - 99.90%:  97791.00 us - 99.99%: 113151.00 us - max:    125439.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Results shows that <code>tranfer time</code> isn&#8217;t queuing up, meaning that subscribers are capable to keep-up with the producers: hence a reasonable rate to test could be ~80% of the perceived <code>sent</code> rate ie <code>--rate 60000</code>:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
 <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span>
 <span class="nt">--rate</span> 60000
 <span class="c"># after many seconds while running</span>
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:         55211 msg/sec
 <span class="nt">---</span> blocked:       2134 msg/sec
 <span class="nt">---</span> completed:    54444 msg/sec
 <span class="nt">---</span> received:    111622 msg/sec
 <span class="nt">---</span> send delay <span class="nb">time</span>: mean: 6306710.04 us - 50.00%: 6094847.00 us - 90.00%: 7766015.00 us - 99.00%: 8224767.00 us - 99.90%: 8257535.00 us - 99.99%: 8257535.00 us - max:    8257535.00 us
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:  50072.92 us - 50.00%:  50431.00 us - 90.00%:  57855.00 us - 99.00%:  65023.00 us - 99.90%:  71167.00 us - 99.99%:  71679.00 us - max:     71679.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:  63672.92 us - 50.00%:  65535.00 us - 90.00%:  78847.00 us - 99.00%:  86015.00 us - 99.90%:  90623.00 us - 99.99%:  93183.00 us - max:     94719.00 us
 <span class="c"># it won't get any better :(</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>What&#8217;s wrong with the <code>send delay time</code>?
 + Results show that the load generator cannot keep up with the expected rate and it&#8217;s accumulating a huge delay on the expected scheduled load: lets trying fixing it by adding more producers threads, adding</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--threads</span> &lt;producer threads&gt;</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>By using two producers threads, the command now looks like:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
 <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span>
 <span class="nt">--rate</span> 60000 <span class="nt">--threads</span> 2
 <span class="c"># after few seconds warming up....</span>
 <span class="nt">---</span> warmup <span class="nb">false</span>
 <span class="nt">---</span> sent:         59894 msg/sec
 <span class="nt">---</span> blocked:        694 msg/sec
 <span class="nt">---</span> completed:    58925 msg/sec
 <span class="nt">---</span> received:    114857 msg/sec
 <span class="nt">---</span> send delay <span class="nb">time</span>: mean:   3189.96 us - 50.00%:    277.00 us - 90.00%:  10623.00 us - 99.00%:  35583.00 us - 99.90%:  47871.00 us - 99.99%:  56063.00 us - max:     58367.00 us
 <span class="nt">---</span> send ack <span class="nb">time</span>:   mean:  31500.93 us - 50.00%:  31231.00 us - 90.00%:  48383.00 us - 99.00%:  65535.00 us - 99.90%:  83455.00 us - 99.99%:  95743.00 us - max:     98303.00 us
 <span class="nt">---</span> transfer <span class="nb">time</span>:   mean:  38151.21 us - 50.00%:  37119.00 us - 90.00%:  55807.00 us - 99.00%:  84479.00 us - 99.90%: 104959.00 us - 99.99%: 118271.00 us - max:    121855.00 us</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p><code>send delay time</code> now seems under control, meaning that the load generator need some tuning in order to work at its best.</p>
 </div>
 </div>
 </div>
 </div>
 </body>
 </html>