| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="generator" content="Asciidoctor 2.0.18"> |
| <link rel="icon" type="image/png" href="images/favicon.png"> |
| <title>Performance Tools</title> |
| <link rel="stylesheet" href="css/asciidoctor.css"> |
| <link rel="stylesheet" href="css/font-awesome.css"> |
| <link rel="stylesheet" href="css/rouge-github.css"> |
| </head> |
| <body class="book toc2 toc-left"> |
| <div id="header"> |
| <h1>Performance Tools</h1> |
| <div id="toc" class="toc2"> |
| <div id="toctitle"><a href="index.html">User Manual for 2.32.0</a></div> |
| <ul class="sectlevel1"> |
| <li><a href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a> |
| <ul class="sectlevel2"> |
| <li><a href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></li> |
| <li><a href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></li> |
| <li><a href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></li> |
| </ul> |
| </li> |
| <li><a href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a> |
| <ul class="sectlevel2"> |
| <li><a href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></li> |
| </ul> |
| </li> |
| <li><a href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></li> |
| </ul> |
| </div> |
| </div> |
| <div id="content"> |
| <div id="preamble"> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Artemis provides some built-in performance test tools based on the <a href="https://javaee.github.io/jms-spec/pages/JMS20FinalRelease">JMS 2 API</a> to help users (and developers) to stress test a configured Artemis broker instance in different scenarios.</p> |
| </div> |
| <div class="paragraph"> |
| <p>These command-line tools won’t represent a full-fat benchmark (such as <a href="https://openmessaging.cloud/docs/benchmarks/">Open Messaging</a>), but can be used as building blocks to produce one. |
| They are also quite useful on their own.</p> |
| </div> |
| <div class="paragraph"> |
| <p>In summary, the provided <code>perf</code> tools are:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p><code>producer</code> tool: it can generate both all-out throughput or target-rate load, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/bytesmessage">BytesMessage</a> of a configured size</p> |
| </li> |
| <li> |
| <p><code>consumer</code> tool: it uses a <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messagelistener">MessageListener</a> to consume messages sent by the <code>producer</code> command</p> |
| </li> |
| <li> |
| <p><code>client</code> tools: it packs both tools as a single command</p> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>Most users will just need the <code>client</code> tool, but the <code>producer</code> and <code>consumer</code> tools allow performing tests in additional scenario(s):</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>delaying consumer start, in order to cause the broker to page</p> |
| </li> |
| <li> |
| <p>running producers and consumers on different machines</p> |
| </li> |
| <li> |
| <p>…​</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The examples below (running on a <code>64 bit Linux 5.14 with Intel® Core™ i7-9850H CPU @ 2.60GHz × 12 with Turbo Boost disabled, 32 GB of RAM and SSD</code>) show different use cases of increasing complexity. |
| As they progress, some internal architectural details of the tool and the configuration options supported, are explored.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| <div class="paragraph"> |
| <p>The tools can run both from within the broker’s instance folder or from the home folder. |
| In the former case it will use the same JVM parameter configured on the instance (on <code>artemis.profile</code>), while in the latter case the user should set <code>JAVA_ARGS</code> environment variable to override default heap and GC parameters (e.g. <code>-XX:+UseParallelGC -Xms512M -Xmx1024M</code>)</p> |
| </div> |
| </td> |
| </tr> |
| </table> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="case-1-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-1-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>This is the simplest possible case: running a load test with 1 producer and 1 consumer on a non-durable queue <code>TEST_QUEUE</code>, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/deliverymode#NON_PERSISTENT">non-persistent</a> 1024 bytes long (by default) messages, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#AUTO_ACKNOWLEDGE">auto-acknowledge</a>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Let’s see what happens after typing:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client queue://TEST_QUEUE |
| Connection brokerURL <span class="o">=</span> tcp://localhost:61616 |
| 2022-01-18 10:30:54,535 WARN <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server! |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 7316 msg/sec |
| <span class="nt">---</span> blocked: 6632 msg/sec |
| <span class="nt">---</span> completed: 7320 msg/sec |
| <span class="nt">---</span> received: 7317 msg/sec |
| <span class="c"># ...</span></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The test keeps on running, until <code>SIGTERM</code> or <code>SIGINT</code> signals are sent to the Java process (on Linux Console it translates into pressing <strong>CTRL + C</strong>). |
| Before looking what the metrics mean, there’s an initial <code>WARN</code> log that shouldn’t be ignored:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash">WARN <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server!</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>It shows two things:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p>the load generator uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">async message producers</a></p> |
| </li> |
| <li> |
| <p><code>confirmationWindowSize</code> is an Artemis CORE protocol specific setting; |
| the <code>perf</code> commands uses CORE as the default JMS provider</p> |
| </li> |
| </ol> |
| </div> |
| <div class="sect2"> |
| <h3 id="live-latency-console-reporting"><a class="anchor" href="#live-latency-console-reporting"></a><a class="link" href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></h3> |
| <div class="paragraph"> |
| <p>The <code>perf client</code> command can report on Console different latency percentiles metrics by adding <code>--show-latency</code> to the command arguments, but in order to obtain meaningful metrics, we need to address <code>WARN</code> by setting <code>confirmationWindowSize</code> on the producer <code>url</code>, setting <code>--consumer-url</code> to save applying the same configuration for consumer(s).</p> |
| </div> |
| <div class="paragraph"> |
| <p>In short, the command is using these additional parameters:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616</code></pre> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="running-it"><a class="anchor" href="#running-it"></a><a class="link" href="#running-it">1.1.1. Running it</a></h4> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 8114 msg/sec |
| <span class="nt">---</span> blocked: 8114 msg/sec |
| <span class="nt">---</span> completed: 8114 msg/sec |
| <span class="nt">---</span> received: 8113 msg/sec |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 113.01 us - 50.00%: 106.00 us - 90.00%: 142.00 us - 99.00%: 204.00 us - 99.90%: 371.00 us - 99.99%: 3455.00 us - max: 3455.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 213.71 us - 50.00%: 126.00 us - 90.00%: 177.00 us - 99.00%: 3439.00 us - 99.90%: 7967.00 us - 99.99%: 8895.00 us - max: 8895.00 us |
| <span class="c"># CTRL + C pressed</span> |
| <span class="nt">---</span> SUMMARY |
| <span class="nt">---</span> result: success |
| <span class="nt">---</span> total sent: 70194 |
| <span class="nt">---</span> total blocked: 70194 |
| <span class="nt">---</span> total completed: 70194 |
| <span class="nt">---</span> total received: 70194 |
| <span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 101.53 us - 50.00%: 86.00 us - 90.00%: 140.00 us - 99.00%: 283.00 us - 99.90%: 591.00 us - 99.99%: 2007.00 us - max: 24959.00 us |
| <span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 127.48 us - 50.00%: 97.00 us - 90.00%: 166.00 us - 99.00%: 449.00 us - 99.90%: 4671.00 us - 99.99%: 8255.00 us - max: 27263.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Some notes:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p><code>WARN</code> message is now gone</p> |
| </li> |
| <li> |
| <p><code>send ack time</code> and <code>transfer time</code> statistics are printed at second interval</p> |
| </li> |
| <li> |
| <p><code>total</code> and <code>aggregated</code> metrics are printed on test completion (more on this later)</p> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>The meaning of the live latency statistics are:</p> |
| </div> |
| <div class="dlist"> |
| <dl> |
| <dt class="hdlist1">send ack time</dt> |
| <dd> |
| <p>percentiles of latency to acknowledge sent messages</p> |
| </dd> |
| <dt class="hdlist1">transfer time</dt> |
| <dd> |
| <p>percentiles of latency to transfer messages from producer(s) to consumer(s)</p> |
| </dd> |
| </dl> |
| </div> |
| <div class="paragraph"> |
| <p>The <code>perf</code> commands uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">JMS 2 async message producers</a> that allow the load generator to accumulate in-flight sent messages and depending on the protocol implementation, may block its producer thread due to producer flow control. |
| e.g: the Artemis CORE protocol can block producers threads to refill producers credits, while the <a href="https://qpid.apache.org/components/jms/index.html">QPID-JMS</a> won’t.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The <code>perf</code> tool is implementing its own in-flight sent requests tracking and can be configured to limit the amount of pending sent messages, while reporting the rate by which producers are "blocked" awaiting completions</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| <div class="title">Producer threads are <code>blocked</code>?</div> |
| <div class="paragraph"> |
| <p>Although the load back-pressure mechanism is non-blocking, given that the load generator cannot push further load while back-pressured by the protocol client, the load is semantically "blocked". |
| This detail is relevant to explain the live rate <a href="#running-it">statistics</a> on Console:</p> |
| </div> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="paragraph"> |
| <p>By default, the <code>perf</code> tools (i.e: <code>client</code> and <code>producer</code>) <strong>limits the number of in-flight request to 1</strong>: to change the default setting users should add <code>--max-pending</code> parameter configuration.</p> |
| </div> |
| <div class="admonitionblock warning"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-warning" title="Warning"></i> |
| </td> |
| <td class="content"> |
| <div class="paragraph"> |
| <p>Setting <code>--max-pending 0</code> will disable the load generator in-flight sent messages limiter, allowing the tool to accumulate an unbounded number of in-flight messages, risking <code>OutOfMemoryError</code>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>This is <strong>NOT RECOMMENDED!</strong></p> |
| </div> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="paragraph"> |
| <p>More detail on the metrics:</p> |
| </div> |
| <div class="dlist"> |
| <dl> |
| <dt class="hdlist1">warmup</dt> |
| <dd> |
| <p>The generator phase while the statistics sample is collected; warmup duration can be set by setting <code>--warmup</code></p> |
| </dd> |
| <dt class="hdlist1">sent</dt> |
| <dd> |
| <p>The message sent rate</p> |
| </dd> |
| <dt class="hdlist1">blocked</dt> |
| <dd> |
| <p>The rate of attempts to send a new message, "blocked" awaiting <code>--max-pending</code> refill</p> |
| </dd> |
| <dt class="hdlist1">completed</dt> |
| <dd> |
| <p>The rate of message send acknowledgements received by producer(s)</p> |
| </dd> |
| <dt class="hdlist1">received</dt> |
| <dd> |
| <p>The rate of messages received by consumer(s)</p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="how-to-read-the-live-statistics"><a class="anchor" href="#how-to-read-the-live-statistics"></a><a class="link" href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></h3> |
| <div class="paragraph"> |
| <p>+ The huge amount of <code>blocked</code> vs <code>sent</code> means that the broker wasn’t fast enough to refill the single <code>--max-pending</code> budget before sending a new message. |
| + It can be changed into:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--max-pending</span> 100</code></pre> |
| </div> |
| </div> |
| <div id="our-previous-command" class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE |
| Connection brokerURL <span class="o">=</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 |
| <span class="c"># first samples shows very BAD performance because client JVM is still warming up</span> |
| <span class="nt">---</span> warmup <span class="nb">true</span> |
| <span class="nt">---</span> sent: 27366 msg/sec |
| <span class="nt">---</span> blocked: 361 msg/sec |
| <span class="nt">---</span> completed: 27305 msg/sec |
| <span class="nt">---</span> received: 26195 msg/sec |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 1743.39 us - 50.00%: 1551.00 us - 90.00%: 3119.00 us - 99.00%: 5215.00 us - 99.90%: 8575.00 us - 99.99%: 8703.00 us - max: 23679.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 11860.32 us - 50.00%: 11583.00 us - 90.00%: 18559.00 us - 99.00%: 24319.00 us - 99.90%: 31359.00 us - 99.99%: 31615.00 us - max: 31615.00 us |
| <span class="c"># ... > 20 seconds later ...</span> |
| <span class="c"># performance is now way better then during warmup</span> |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 86525 msg/sec |
| <span class="nt">---</span> blocked: 5734 msg/sec |
| <span class="nt">---</span> completed: 86525 msg/sec |
| <span class="nt">---</span> received: 86556 msg/sec |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 1109.13 us - 50.00%: 1103.00 us - 90.00%: 1447.00 us - 99.00%: 1687.00 us - 99.90%: 5791.00 us - 99.99%: 5983.00 us - max: 5983.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 4662.94 us - 50.00%: 1679.00 us - 90.00%: 12159.00 us - 99.00%: 14079.00 us - 99.90%: 14527.00 us - 99.99%: 14783.00 us - max: 14783.00 us |
| <span class="c"># CTRL + C</span> |
| <span class="nt">---</span> SUMMARY |
| <span class="nt">---</span> result: success |
| <span class="nt">---</span> total sent: 3450389 |
| <span class="nt">---</span> total blocked: 168863 |
| <span class="nt">---</span> total completed: 3450389 |
| <span class="nt">---</span> total received: 3450389 |
| <span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 1056.09 us - 50.00%: 1003.00 us - 90.00%: 1423.00 us - 99.00%: 1639.00 us - 99.90%: 4287.00 us - 99.99%: 7103.00 us - max: 19583.00 us |
| <span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 18647.51 us - 50.00%: 10751.00 us - 90.00%: 54271.00 us - 99.00%: 84991.00 us - 99.90%: 90111.00 us - 99.99%: 93183.00 us - max: 94207.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Some notes on the results:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>we now have a reasonable <code>blocked/sent</code> ratio (< ~10%)</p> |
| </li> |
| <li> |
| <p>sent rate has improved <strong>ten-fold</strong> if compared to <a href="#running-it">previous results</a></p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>And on the <code>SUMMARY</code> statistics:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><code>total</code> counters include measurements collected with <code>warmup true</code></p> |
| </li> |
| <li> |
| <p><code>aggregated</code> latencies <strong>don’t</strong> include measurements collected with <code>warmup true</code></p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="how-to-compare-latencies-across-tests"><a class="anchor" href="#how-to-compare-latencies-across-tests"></a><a class="link" href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></h3> |
| <div class="paragraph"> |
| <p>The Console output format isn’t designed for easy latency comparisons, however the <code>perf</code> commands expose <code>--hdr <hdr file name></code> parameter to produce a <a href="http://hdrhistogram.org/">HDR Histogram</a> compatible report that can be opened with different visualizers + eg <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a>, <a href="https://github.com/ennerf/HdrHistogramVisualizer">HdrHistogramVisualizer</a> or <a href="https://github.com/HdrHistogram/HistogramLogAnalyzer">HistogramLogAnalyzer</a>.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| Any latency collected trace on this guide is going to use <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a> as HDR Histogram visualizer tool. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="paragraph"> |
| <p>Below is the visualization of the HDR histograms collected while adding to the previous benchmark</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--hdr</span> /tmp/non_durable_queue.hdr</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Whole test execution shows tagged latencies, to distinguish <code>warmup</code> ones:</p> |
| </div> |
| <div class="imageblock"> |
| <div class="content"> |
| <img src="images/test.png" alt="test"> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Filtering out <code>warmup</code> latencies, it looks like</p> |
| </div> |
| <div class="imageblock"> |
| <div class="content"> |
| <img src="images/hot_test.png" alt="hot test"> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Latency results shows that at higher percentiles <code>transfer</code> latency is way higher than the <code>sent</code> one (reminder: <code>sent</code> it’s the time to acknowledge sent messages), probably meaning that some queuing-up is happening on the broker.</p> |
| </div> |
| <div class="paragraph"> |
| <p>In order to test this theory we switch to <strong>target rate tests</strong>.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="case-2-target-rate-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><code>perf client</code> and <code>perf producer</code> tools allow specifying a target rate to schedule producer(s) requests: adding</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> <msg/sec integer value></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The previous example <a href="#our-previous-command">last run</a> shows that <code>--max-pending 100</code> guarantees < 10% blocked/sent messages with aggregated latencies</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 1056.09 us - 50.00%: 1003.00 us - 90.00%: 1423.00 us - 99.00%: 1639.00 us - 99.90%: 4287.00 us - 99.99%: 7103.00 us - max: 19583.00 us |
| <span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 18647.51 us - 50.00%: 10751.00 us - 90.00%: 54271.00 us - 99.00%: 84991.00 us - 99.90%: 90111.00 us - 99.99%: 93183.00 us - max: 94207.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>We would like to lower <code>transfer time</code> sub-millisecond; |
| let’s try by running a load test with ~30% of the max perceived sent rate, by setting:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The whole command is then:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE |
| <span class="c"># ... after 20 warmup seconds ...</span> |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 30302 msg/sec |
| <span class="nt">---</span> blocked: 0 msg/sec |
| <span class="nt">---</span> completed: 30302 msg/sec |
| <span class="nt">---</span> received: 30303 msg/sec |
| <span class="nt">---</span> send delay <span class="nb">time</span>: mean: 24.20 us - 50.00%: 21.00 us - 90.00%: 54.00 us - 99.00%: 72.00 us - 99.90%: 233.00 us - 99.99%: 659.00 us - max: 731.00 us |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 150.48 us - 50.00%: 120.00 us - 90.00%: 172.00 us - 99.00%: 1223.00 us - 99.90%: 2543.00 us - 99.99%: 3183.00 us - max: 3247.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 171.53 us - 50.00%: 135.00 us - 90.00%: 194.00 us - 99.00%: 1407.00 us - 99.90%: 2607.00 us - 99.99%: 3151.00 us - max: 3183.00 us |
| <span class="c"># CTRL + C</span> |
| <span class="nt">---</span> SUMMARY |
| <span class="nt">---</span> result: success |
| <span class="nt">---</span> total sent: 1216053 |
| <span class="nt">---</span> total blocked: 845 |
| <span class="nt">---</span> total completed: 1216053 |
| <span class="nt">---</span> total received: 1216053 |
| <span class="nt">---</span> aggregated delay send <span class="nb">time</span>: mean: 35.84 us - 50.00%: 20.00 us - 90.00%: 55.00 us - 99.00%: 116.00 us - 99.90%: 3359.00 us - 99.99%: 5503.00 us - max: 6495.00 us |
| <span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 147.38 us - 50.00%: 117.00 us - 90.00%: 165.00 us - 99.00%: 991.00 us - 99.90%: 4191.00 us - 99.99%: 5695.00 us - max: 7103.00 us |
| <span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 178.48 us - 50.00%: 134.00 us - 90.00%: 188.00 us - 99.00%: 1359.00 us - 99.90%: 5471.00 us - 99.99%: 8831.00 us - max: 12799.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>We’ve now achieved sub-millisecond <code>transfer</code> latencies until <code>90.00 pencentile</code>. |
| + Opening <code>/tmp/30K.hdr</code> makes easier to see it:</p> |
| </div> |
| <div class="imageblock"> |
| <div class="content"> |
| <img src="images/30K.png" alt="test"> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Now <code>send</code> and <code>transfer</code> time looks quite similar and there’s no sign of queueing, but…​</p> |
| </div> |
| <div class="sect2"> |
| <h3 id="what-delay-send-time-means"><a class="anchor" href="#what-delay-send-time-means"></a><a class="link" href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></h3> |
| <div class="paragraph"> |
| <p>This metric is borrowed from the <a href="http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html">Coordinated Omission</a> concept, and it measures the delay of producer(s) while trying to send messages at the requested rate.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The source of such delay could be:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>slow responding broker: the load generator reached <code>--max-pending</code> and the expected rate cannot be honored</p> |
| </li> |
| <li> |
| <p>client running out of resources (lack of CPU time, GC pauses, etc etc): load generator cannot keep-up with the expected rate because it is just "too fast" for it</p> |
| </li> |
| <li> |
| <p>protocol-dependent blocking behaviours: CORE JMS 2 async send can block due to <code>producerWindowSize</code> exhaustion</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>A sane run of a target rate test should keep <code>delay send time</code> under control or investigation actions must be taken to understand what’s the source of the delay. |
| + Let’s show it with an example: we’ve already checked the all-out rate of the broker ie ~90K msg/sec</p> |
| </div> |
| <div class="paragraph"> |
| <p>By running a <code>--rate 90000</code> test under the same conditions, latencies will look as</p> |
| </div> |
| <div class="imageblock"> |
| <div class="content"> |
| <img src="images/90K.png" alt="test"> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>It clearly shows that the load generator is getting delayed and cannot keep-up with the expected rate.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Below is a more complex example involving destinations (auto)generation with "asymmetric" load i.e: the producer number is different from consumer number.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"><a class="anchor" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"></a><a class="link" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The <code>perf</code> tool can auto generate destinations using</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> <number of destinations to generate></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>and naming them by using the destination name specified as the seed and an ordered sequence suffix.</p> |
| </div> |
| <div class="paragraph"> |
| <p>eg</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> 3 topic://TOPIC</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>would generate 3 topics: <code>TOPIC0</code>, <code>TOPIC1</code>, <code>TOPIC2</code>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>With the default configuration (without specifying <code>--num-destinations</code>) it would just create <code>TOPIC</code>, without any numerical suffix.</p> |
| </div> |
| <div class="paragraph"> |
| <p>In order to create a load generation on 10 topics, <strong>each</strong> with 3 producers and 2 unshared consumers:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 topic://TOPIC</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The whole <code>perf client</code> all-out throughput command would be:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="c"># same as in the previous cases</span> |
| ./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--s</span> |
| how-latency <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span> |
| <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC |
| <span class="c"># this last part above is new</span></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>and it would print…​</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash">javax.jms.IllegalStateException: Cannot create durable subscription - client ID has not been <span class="nb">set</span></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Given that the generator is creating <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions</a>, is it mandatory to set a ClientID for each connection used.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The <code>perf client</code> tool creates a connection for each consumer by default and auto-generates both ClientIDs and subscriptions names (as required by the <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions API</a>). |
| ClientID still requires users to specify Client ID prefixes with <code>--clientID <Client ID prefix></code> and takes care to unsubscribe the consumers on test completion.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The complete commands now looks like:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span> |
| <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id |
| <span class="c"># after few seconds</span> |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 74842 msg/sec |
| <span class="nt">---</span> blocked: 2702 msg/sec |
| <span class="nt">---</span> completed: 74641 msg/sec |
| <span class="nt">---</span> received: 146412 msg/sec |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 37366.13 us - 50.00%: 37119.00 us - 90.00%: 46079.00 us - 99.00%: 68095.00 us - 99.90%: 84479.00 us - 99.99%: 94719.00 us - max: 95743.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 44060.66 us - 50.00%: 43263.00 us - 90.00%: 54527.00 us - 99.00%: 75775.00 us - 99.90%: 87551.00 us - 99.99%: 91135.00 us - max: 91135.00 us |
| <span class="c"># CTRL + C</span> |
| <span class="nt">---</span> SUMMARY |
| <span class="nt">---</span> result: success |
| <span class="nt">---</span> total sent: 2377653 |
| <span class="nt">---</span> total blocked: 80004 |
| <span class="nt">---</span> total completed: 2377653 |
| <span class="nt">---</span> total received: 4755306 |
| <span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 39423.69 us - 50.00%: 38911.00 us - 90.00%: 49663.00 us - 99.00%: 66047.00 us - 99.90%: 85503.00 us - 99.99%: 101887.00 us - max: 115711.00 us |
| <span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 46216.99 us - 50.00%: 45311.00 us - 90.00%: 57855.00 us - 99.00%: 78335.00 us - 99.90%: 97791.00 us - 99.99%: 113151.00 us - max: 125439.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Results shows that <code>tranfer time</code> isn’t queuing up, meaning that subscribers are capable to keep-up with the producers: hence a reasonable rate to test could be ~80% of the perceived <code>sent</code> rate ie <code>--rate 60000</code>:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span> |
| <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span> |
| <span class="nt">--rate</span> 60000 |
| <span class="c"># after many seconds while running</span> |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 55211 msg/sec |
| <span class="nt">---</span> blocked: 2134 msg/sec |
| <span class="nt">---</span> completed: 54444 msg/sec |
| <span class="nt">---</span> received: 111622 msg/sec |
| <span class="nt">---</span> send delay <span class="nb">time</span>: mean: 6306710.04 us - 50.00%: 6094847.00 us - 90.00%: 7766015.00 us - 99.00%: 8224767.00 us - 99.90%: 8257535.00 us - 99.99%: 8257535.00 us - max: 8257535.00 us |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 50072.92 us - 50.00%: 50431.00 us - 90.00%: 57855.00 us - 99.00%: 65023.00 us - 99.90%: 71167.00 us - 99.99%: 71679.00 us - max: 71679.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 63672.92 us - 50.00%: 65535.00 us - 90.00%: 78847.00 us - 99.00%: 86015.00 us - 99.90%: 90623.00 us - 99.99%: 93183.00 us - max: 94719.00 us |
| <span class="c"># it won't get any better :(</span></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>What’s wrong with the <code>send delay time</code>? |
| + Results show that the load generator cannot keep up with the expected rate and it’s accumulating a huge delay on the expected scheduled load: lets trying fixing it by adding more producers threads, adding</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--threads</span> <producer threads></code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>By using two producers threads, the command now looks like:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span> |
| <span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span> |
| <span class="nt">--rate</span> 60000 <span class="nt">--threads</span> 2 |
| <span class="c"># after few seconds warming up....</span> |
| <span class="nt">---</span> warmup <span class="nb">false</span> |
| <span class="nt">---</span> sent: 59894 msg/sec |
| <span class="nt">---</span> blocked: 694 msg/sec |
| <span class="nt">---</span> completed: 58925 msg/sec |
| <span class="nt">---</span> received: 114857 msg/sec |
| <span class="nt">---</span> send delay <span class="nb">time</span>: mean: 3189.96 us - 50.00%: 277.00 us - 90.00%: 10623.00 us - 99.00%: 35583.00 us - 99.90%: 47871.00 us - 99.99%: 56063.00 us - max: 58367.00 us |
| <span class="nt">---</span> send ack <span class="nb">time</span>: mean: 31500.93 us - 50.00%: 31231.00 us - 90.00%: 48383.00 us - 99.00%: 65535.00 us - 99.90%: 83455.00 us - 99.99%: 95743.00 us - max: 98303.00 us |
| <span class="nt">---</span> transfer <span class="nb">time</span>: mean: 38151.21 us - 50.00%: 37119.00 us - 90.00%: 55807.00 us - 99.00%: 84479.00 us - 99.90%: 104959.00 us - 99.99%: 118271.00 us - max: 121855.00 us</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p><code>send delay time</code> now seems under control, meaning that the load generator need some tuning in order to work at its best.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </body> |
| </html> |