blob: 58cdfc9bbdf139d7c139a8c796d105ecee68aa79 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 2.0.18">
<link rel="icon" type="image/png" href="images/favicon.png">
<title>Performance Tools</title>
<link rel="stylesheet" href="css/asciidoctor.css">
<link rel="stylesheet" href="css/font-awesome.css">
<link rel="stylesheet" href="css/rouge-github.css">
</head>
<body class="book toc2 toc-left">
<div id="header">
<h1>Performance Tools</h1>
<div id="toc" class="toc2">
<div id="toctitle"><a href="index.html">User Manual for 2.33.0</a></div>
<ul class="sectlevel1">
<li><a href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a>
<ul class="sectlevel2">
<li><a href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></li>
<li><a href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></li>
<li><a href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></li>
</ul>
</li>
<li><a href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a>
<ul class="sectlevel2">
<li><a href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></li>
</ul>
</li>
<li><a href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></li>
</ul>
</div>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Artemis provides some built-in performance test tools based on the <a href="https://javaee.github.io/jms-spec/pages/JMS20FinalRelease">JMS 2 API</a> to help users (and developers) to stress test a configured Artemis broker instance in different scenarios.</p>
</div>
<div class="paragraph">
<p>These command-line tools won&#8217;t represent a full-fat benchmark (such as <a href="https://openmessaging.cloud/docs/benchmarks/">Open Messaging</a>), but can be used as building blocks to produce one.
They are also quite useful on their own.</p>
</div>
<div class="paragraph">
<p>In summary, the provided <code>perf</code> tools are:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><code>producer</code> tool: it can generate both all-out throughput or target-rate load, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/bytesmessage">BytesMessage</a> of a configured size</p>
</li>
<li>
<p><code>consumer</code> tool: it uses a <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messagelistener">MessageListener</a> to consume messages sent by the <code>producer</code> command</p>
</li>
<li>
<p><code>client</code> tools: it packs both tools as a single command</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>Most users will just need the <code>client</code> tool, but the <code>producer</code> and <code>consumer</code> tools allow performing tests in additional scenario(s):</p>
</div>
<div class="ulist">
<ul>
<li>
<p>delaying consumer start, in order to cause the broker to page</p>
</li>
<li>
<p>running producers and consumers on different machines</p>
</li>
<li>
<p>&#8230;&#8203;</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The examples below (running on a <code>64 bit Linux 5.14 with Intel® Core™ i7-9850H CPU @ 2.60GHz × 12 with Turbo Boost disabled, 32 GB of RAM and SSD</code>) show different use cases of increasing complexity.
As they progress, some internal architectural details of the tool and the configuration options supported, are explored.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>The tools can run both from within the broker&#8217;s instance folder or from the home folder.
In the former case it will use the same JVM parameter configured on the instance (on <code>artemis.profile</code>), while in the latter case the user should set <code>JAVA_ARGS</code> environment variable to override default heap and GC parameters (e.g. <code>-XX:+UseParallelGC -Xms512M -Xmx1024M</code>)</p>
</div>
</td>
</tr>
</table>
</div>
</div>
</div>
<div class="sect1">
<h2 id="case-1-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-1-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-1-single-producer-single-consumer-over-a-queue">1. Case 1: Single producer Single consumer over a queue</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>This is the simplest possible case: running a load test with 1 producer and 1 consumer on a non-durable queue <code>TEST_QUEUE</code>, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/deliverymode#NON_PERSISTENT">non-persistent</a> 1024 bytes long (by default) messages, using <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#AUTO_ACKNOWLEDGE">auto-acknowledge</a>.</p>
</div>
<div class="paragraph">
<p>Let&#8217;s see what happens after typing:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client queue://TEST_QUEUE
Connection brokerURL <span class="o">=</span> tcp://localhost:61616
2022-01-18 10:30:54,535 WARN <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server!
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 7316 msg/sec
<span class="nt">---</span> blocked: 6632 msg/sec
<span class="nt">---</span> completed: 7320 msg/sec
<span class="nt">---</span> received: 7317 msg/sec
<span class="c"># ...</span></code></pre>
</div>
</div>
<div class="paragraph">
<p>The test keeps on running, until <code>SIGTERM</code> or <code>SIGINT</code> signals are sent to the Java process (on Linux Console it translates into pressing <strong>CTRL + C</strong>).
Before looking what the metrics mean, there&#8217;s an initial <code>WARN</code> log that shouldn&#8217;t be ignored:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash">WARN <span class="o">[</span>org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with <span class="nv">confirmationWindowSize</span><span class="o">=</span><span class="nt">-1</span><span class="nb">.</span> Enable confirmationWindowSize to receive acks from server!</code></pre>
</div>
</div>
<div class="paragraph">
<p>It shows two things:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>the load generator uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">async message producers</a></p>
</li>
<li>
<p><code>confirmationWindowSize</code> is an Artemis CORE protocol specific setting;
the <code>perf</code> commands uses CORE as the default JMS provider</p>
</li>
</ol>
</div>
<div class="sect2">
<h3 id="live-latency-console-reporting"><a class="anchor" href="#live-latency-console-reporting"></a><a class="link" href="#live-latency-console-reporting">1.1. Live Latency Console Reporting</a></h3>
<div class="paragraph">
<p>The <code>perf client</code> command can report on Console different latency percentiles metrics by adding <code>--show-latency</code> to the command arguments, but in order to obtain meaningful metrics, we need to address <code>WARN</code> by setting <code>confirmationWindowSize</code> on the producer <code>url</code>, setting <code>--consumer-url</code> to save applying the same configuration for consumer(s).</p>
</div>
<div class="paragraph">
<p>In short, the command is using these additional parameters:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616</code></pre>
</div>
</div>
<div class="sect3">
<h4 id="running-it"><a class="anchor" href="#running-it"></a><a class="link" href="#running-it">1.1.1. Running it</a></h4>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 8114 msg/sec
<span class="nt">---</span> blocked: 8114 msg/sec
<span class="nt">---</span> completed: 8114 msg/sec
<span class="nt">---</span> received: 8113 msg/sec
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 113.01 us - 50.00%: 106.00 us - 90.00%: 142.00 us - 99.00%: 204.00 us - 99.90%: 371.00 us - 99.99%: 3455.00 us - max: 3455.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 213.71 us - 50.00%: 126.00 us - 90.00%: 177.00 us - 99.00%: 3439.00 us - 99.90%: 7967.00 us - 99.99%: 8895.00 us - max: 8895.00 us
<span class="c"># CTRL + C pressed</span>
<span class="nt">---</span> SUMMARY
<span class="nt">---</span> result: success
<span class="nt">---</span> total sent: 70194
<span class="nt">---</span> total blocked: 70194
<span class="nt">---</span> total completed: 70194
<span class="nt">---</span> total received: 70194
<span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 101.53 us - 50.00%: 86.00 us - 90.00%: 140.00 us - 99.00%: 283.00 us - 99.90%: 591.00 us - 99.99%: 2007.00 us - max: 24959.00 us
<span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 127.48 us - 50.00%: 97.00 us - 90.00%: 166.00 us - 99.00%: 449.00 us - 99.90%: 4671.00 us - 99.99%: 8255.00 us - max: 27263.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some notes:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><code>WARN</code> message is now gone</p>
</li>
<li>
<p><code>send ack time</code> and <code>transfer time</code> statistics are printed at second interval</p>
</li>
<li>
<p><code>total</code> and <code>aggregated</code> metrics are printed on test completion (more on this later)</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>The meaning of the live latency statistics are:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">send ack time</dt>
<dd>
<p>percentiles of latency to acknowledge sent messages</p>
</dd>
<dt class="hdlist1">transfer time</dt>
<dd>
<p>percentiles of latency to transfer messages from producer(s) to consumer(s)</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>The <code>perf</code> commands uses <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/messageproducer#send-javax.jms.Destination-javax.jms.Message-javax.jms.CompletionListener-">JMS 2 async message producers</a> that allow the load generator to accumulate in-flight sent messages and depending on the protocol implementation, may block its producer thread due to producer flow control.
e.g: the Artemis CORE protocol can block producers threads to refill producers credits, while the <a href="https://qpid.apache.org/components/jms/index.html">QPID-JMS</a> won&#8217;t.</p>
</div>
<div class="paragraph">
<p>The <code>perf</code> tool is implementing its own in-flight sent requests tracking and can be configured to limit the amount of pending sent messages, while reporting the rate by which producers are "blocked" awaiting completions</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="title">Producer threads are <code>blocked</code>?</div>
<div class="paragraph">
<p>Although the load back-pressure mechanism is non-blocking, given that the load generator cannot push further load while back-pressured by the protocol client, the load is semantically "blocked".
This detail is relevant to explain the live rate <a href="#running-it">statistics</a> on Console:</p>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>By default, the <code>perf</code> tools (i.e: <code>client</code> and <code>producer</code>) <strong>limits the number of in-flight request to 1</strong>: to change the default setting users should add <code>--max-pending</code> parameter configuration.</p>
</div>
<div class="admonitionblock warning">
<table>
<tr>
<td class="icon">
<i class="fa icon-warning" title="Warning"></i>
</td>
<td class="content">
<div class="paragraph">
<p>Setting <code>--max-pending 0</code> will disable the load generator in-flight sent messages limiter, allowing the tool to accumulate an unbounded number of in-flight messages, risking <code>OutOfMemoryError</code>.</p>
</div>
<div class="paragraph">
<p>This is <strong>NOT RECOMMENDED!</strong></p>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>More detail on the metrics:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">warmup</dt>
<dd>
<p>The generator phase while the statistics sample is collected; warmup duration can be set by setting <code>--warmup</code></p>
</dd>
<dt class="hdlist1">sent</dt>
<dd>
<p>The message sent rate</p>
</dd>
<dt class="hdlist1">blocked</dt>
<dd>
<p>The rate of attempts to send a new message, "blocked" awaiting <code>--max-pending</code> refill</p>
</dd>
<dt class="hdlist1">completed</dt>
<dd>
<p>The rate of message send acknowledgements received by producer(s)</p>
</dd>
<dt class="hdlist1">received</dt>
<dd>
<p>The rate of messages received by consumer(s)</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect2">
<h3 id="how-to-read-the-live-statistics"><a class="anchor" href="#how-to-read-the-live-statistics"></a><a class="link" href="#how-to-read-the-live-statistics">1.2. How to read the live statistics?</a></h3>
<div class="paragraph">
<p>+ The huge amount of <code>blocked</code> vs <code>sent</code> means that the broker wasn&#8217;t fast enough to refill the single <code>--max-pending</code> budget before sending a new message.
+ It can be changed into:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--max-pending</span> 100</code></pre>
</div>
</div>
<div id="our-previous-command" class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
Connection brokerURL <span class="o">=</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000
<span class="c"># first samples shows very BAD performance because client JVM is still warming up</span>
<span class="nt">---</span> warmup <span class="nb">true</span>
<span class="nt">---</span> sent: 27366 msg/sec
<span class="nt">---</span> blocked: 361 msg/sec
<span class="nt">---</span> completed: 27305 msg/sec
<span class="nt">---</span> received: 26195 msg/sec
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 1743.39 us - 50.00%: 1551.00 us - 90.00%: 3119.00 us - 99.00%: 5215.00 us - 99.90%: 8575.00 us - 99.99%: 8703.00 us - max: 23679.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 11860.32 us - 50.00%: 11583.00 us - 90.00%: 18559.00 us - 99.00%: 24319.00 us - 99.90%: 31359.00 us - 99.99%: 31615.00 us - max: 31615.00 us
<span class="c"># ... &gt; 20 seconds later ...</span>
<span class="c"># performance is now way better then during warmup</span>
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 86525 msg/sec
<span class="nt">---</span> blocked: 5734 msg/sec
<span class="nt">---</span> completed: 86525 msg/sec
<span class="nt">---</span> received: 86556 msg/sec
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 1109.13 us - 50.00%: 1103.00 us - 90.00%: 1447.00 us - 99.00%: 1687.00 us - 99.90%: 5791.00 us - 99.99%: 5983.00 us - max: 5983.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 4662.94 us - 50.00%: 1679.00 us - 90.00%: 12159.00 us - 99.00%: 14079.00 us - 99.90%: 14527.00 us - 99.99%: 14783.00 us - max: 14783.00 us
<span class="c"># CTRL + C</span>
<span class="nt">---</span> SUMMARY
<span class="nt">---</span> result: success
<span class="nt">---</span> total sent: 3450389
<span class="nt">---</span> total blocked: 168863
<span class="nt">---</span> total completed: 3450389
<span class="nt">---</span> total received: 3450389
<span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 1056.09 us - 50.00%: 1003.00 us - 90.00%: 1423.00 us - 99.00%: 1639.00 us - 99.90%: 4287.00 us - 99.99%: 7103.00 us - max: 19583.00 us
<span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 18647.51 us - 50.00%: 10751.00 us - 90.00%: 54271.00 us - 99.00%: 84991.00 us - 99.90%: 90111.00 us - 99.99%: 93183.00 us - max: 94207.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some notes on the results:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>we now have a reasonable <code>blocked/sent</code> ratio (&lt; ~10%)</p>
</li>
<li>
<p>sent rate has improved <strong>ten-fold</strong> if compared to <a href="#running-it">previous results</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>And on the <code>SUMMARY</code> statistics:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><code>total</code> counters include measurements collected with <code>warmup true</code></p>
</li>
<li>
<p><code>aggregated</code> latencies <strong>don&#8217;t</strong> include measurements collected with <code>warmup true</code></p>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="how-to-compare-latencies-across-tests"><a class="anchor" href="#how-to-compare-latencies-across-tests"></a><a class="link" href="#how-to-compare-latencies-across-tests">1.3. How to compare latencies across tests?</a></h3>
<div class="paragraph">
<p>The Console output format isn&#8217;t designed for easy latency comparisons, however the <code>perf</code> commands expose <code>--hdr &lt;hdr file name&gt;</code> parameter to produce a <a href="http://hdrhistogram.org/">HDR Histogram</a> compatible report that can be opened with different visualizers + eg <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a>, <a href="https://github.com/ennerf/HdrHistogramVisualizer">HdrHistogramVisualizer</a> or <a href="https://github.com/HdrHistogram/HistogramLogAnalyzer">HistogramLogAnalyzer</a>.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
Any latency collected trace on this guide is going to use <a href="https://hdrhistogram.github.io/HdrHistogramJSDemo/logparser.html">Online HdrHistogram Log Analyzer</a> as HDR Histogram visualizer tool.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>Below is the visualization of the HDR histograms collected while adding to the previous benchmark</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--hdr</span> /tmp/non_durable_queue.hdr</code></pre>
</div>
</div>
<div class="paragraph">
<p>Whole test execution shows tagged latencies, to distinguish <code>warmup</code> ones:</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/test.png" alt="test">
</div>
</div>
<div class="paragraph">
<p>Filtering out <code>warmup</code> latencies, it looks like</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/hot_test.png" alt="hot test">
</div>
</div>
<div class="paragraph">
<p>Latency results shows that at higher percentiles <code>transfer</code> latency is way higher than the <code>sent</code> one (reminder: <code>sent</code> it&#8217;s the time to acknowledge sent messages), probably meaning that some queuing-up is happening on the broker.</p>
</div>
<div class="paragraph">
<p>In order to test this theory we switch to <strong>target rate tests</strong>.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="case-2-target-rate-single-producer-single-consumer-over-a-queue"><a class="anchor" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue"></a><a class="link" href="#case-2-target-rate-single-producer-single-consumer-over-a-queue">2. Case 2: Target Rate Single producer Single consumer over a queue</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p><code>perf client</code> and <code>perf producer</code> tools allow specifying a target rate to schedule producer(s) requests: adding</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> &lt;msg/sec integer value&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>The previous example <a href="#our-previous-command">last run</a> shows that <code>--max-pending 100</code> guarantees &lt; 10% blocked/sent messages with aggregated latencies</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 1056.09 us - 50.00%: 1003.00 us - 90.00%: 1423.00 us - 99.00%: 1639.00 us - 99.90%: 4287.00 us - 99.99%: 7103.00 us - max: 19583.00 us
<span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 18647.51 us - 50.00%: 10751.00 us - 90.00%: 54271.00 us - 99.00%: 84991.00 us - 99.90%: 90111.00 us - 99.99%: 93183.00 us - max: 94207.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p>We would like to lower <code>transfer time</code> sub-millisecond;
let&#8217;s try by running a load test with ~30% of the max perceived sent rate, by setting:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr</code></pre>
</div>
</div>
<div class="paragraph">
<p>The whole command is then:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nv">$ </span>./artemis perf client <span class="nt">--rate</span> 30000 <span class="nt">--hdr</span> /tmp/30K.hdr <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 queue://TEST_QUEUE
<span class="c"># ... after 20 warmup seconds ...</span>
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 30302 msg/sec
<span class="nt">---</span> blocked: 0 msg/sec
<span class="nt">---</span> completed: 30302 msg/sec
<span class="nt">---</span> received: 30303 msg/sec
<span class="nt">---</span> send delay <span class="nb">time</span>: mean: 24.20 us - 50.00%: 21.00 us - 90.00%: 54.00 us - 99.00%: 72.00 us - 99.90%: 233.00 us - 99.99%: 659.00 us - max: 731.00 us
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 150.48 us - 50.00%: 120.00 us - 90.00%: 172.00 us - 99.00%: 1223.00 us - 99.90%: 2543.00 us - 99.99%: 3183.00 us - max: 3247.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 171.53 us - 50.00%: 135.00 us - 90.00%: 194.00 us - 99.00%: 1407.00 us - 99.90%: 2607.00 us - 99.99%: 3151.00 us - max: 3183.00 us
<span class="c"># CTRL + C</span>
<span class="nt">---</span> SUMMARY
<span class="nt">---</span> result: success
<span class="nt">---</span> total sent: 1216053
<span class="nt">---</span> total blocked: 845
<span class="nt">---</span> total completed: 1216053
<span class="nt">---</span> total received: 1216053
<span class="nt">---</span> aggregated delay send <span class="nb">time</span>: mean: 35.84 us - 50.00%: 20.00 us - 90.00%: 55.00 us - 99.00%: 116.00 us - 99.90%: 3359.00 us - 99.99%: 5503.00 us - max: 6495.00 us
<span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 147.38 us - 50.00%: 117.00 us - 90.00%: 165.00 us - 99.00%: 991.00 us - 99.90%: 4191.00 us - 99.99%: 5695.00 us - max: 7103.00 us
<span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 178.48 us - 50.00%: 134.00 us - 90.00%: 188.00 us - 99.00%: 1359.00 us - 99.90%: 5471.00 us - 99.99%: 8831.00 us - max: 12799.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p>We&#8217;ve now achieved sub-millisecond <code>transfer</code> latencies until <code>90.00 pencentile</code>.
+ Opening <code>/tmp/30K.hdr</code> makes easier to see it:</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/30K.png" alt="test">
</div>
</div>
<div class="paragraph">
<p>Now <code>send</code> and <code>transfer</code> time looks quite similar and there&#8217;s no sign of queueing, but&#8230;&#8203;</p>
</div>
<div class="sect2">
<h3 id="what-delay-send-time-means"><a class="anchor" href="#what-delay-send-time-means"></a><a class="link" href="#what-delay-send-time-means">2.1. What <code>delay send time</code> means?</a></h3>
<div class="paragraph">
<p>This metric is borrowed from the <a href="http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html">Coordinated Omission</a> concept, and it measures the delay of producer(s) while trying to send messages at the requested rate.</p>
</div>
<div class="paragraph">
<p>The source of such delay could be:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>slow responding broker: the load generator reached <code>--max-pending</code> and the expected rate cannot be honored</p>
</li>
<li>
<p>client running out of resources (lack of CPU time, GC pauses, etc etc): load generator cannot keep-up with the expected rate because it is just "too fast" for it</p>
</li>
<li>
<p>protocol-dependent blocking behaviours: CORE JMS 2 async send can block due to <code>producerWindowSize</code> exhaustion</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>A sane run of a target rate test should keep <code>delay send time</code> under control or investigation actions must be taken to understand what&#8217;s the source of the delay.
+ Let&#8217;s show it with an example: we&#8217;ve already checked the all-out rate of the broker ie ~90K msg/sec</p>
</div>
<div class="paragraph">
<p>By running a <code>--rate 90000</code> test under the same conditions, latencies will look as</p>
</div>
<div class="imageblock">
<div class="content">
<img src="images/90K.png" alt="test">
</div>
</div>
<div class="paragraph">
<p>It clearly shows that the load generator is getting delayed and cannot keep-up with the expected rate.</p>
</div>
<div class="paragraph">
<p>Below is a more complex example involving destinations (auto)generation with "asymmetric" load i.e: the producer number is different from consumer number.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"><a class="anchor" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers"></a><a class="link" href="#case-3-target-rate-load-on-10-durable-topics-each-with-3-producers-and-2-unshared-consumers">3. Case 3: Target Rate load on 10 durable topics, each with 3 producers and 2 unshared consumers</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>The <code>perf</code> tool can auto generate destinations using</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> &lt;number of destinations to generate&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>and naming them by using the destination name specified as the seed and an ordered sequence suffix.</p>
</div>
<div class="paragraph">
<p>eg</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--num-destinations</span> 3 topic://TOPIC</code></pre>
</div>
</div>
<div class="paragraph">
<p>would generate 3 topics: <code>TOPIC0</code>, <code>TOPIC1</code>, <code>TOPIC2</code>.</p>
</div>
<div class="paragraph">
<p>With the default configuration (without specifying <code>--num-destinations</code>) it would just create <code>TOPIC</code>, without any numerical suffix.</p>
</div>
<div class="paragraph">
<p>In order to create a load generation on 10 topics, <strong>each</strong> with 3 producers and 2 unshared consumers:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 topic://TOPIC</code></pre>
</div>
</div>
<div class="paragraph">
<p>The whole <code>perf client</code> all-out throughput command would be:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="c"># same as in the previous cases</span>
./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--s</span>
how-latency <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
<span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC
<span class="c"># this last part above is new</span></code></pre>
</div>
</div>
<div class="paragraph">
<p>and it would print&#8230;&#8203;</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash">javax.jms.IllegalStateException: Cannot create durable subscription - client ID has not been <span class="nb">set</span></code></pre>
</div>
</div>
<div class="paragraph">
<p>Given that the generator is creating <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions</a>, is it mandatory to set a ClientID for each connection used.</p>
</div>
<div class="paragraph">
<p>The <code>perf client</code> tool creates a connection for each consumer by default and auto-generates both ClientIDs and subscriptions names (as required by the <a href="https://jakarta.ee/specifications/messaging/2.0/apidocs/javax/jms/session#createDurableConsumer-javax.jms.Topic-java.lang.String-">unshared durable Topic subscriptions API</a>).
ClientID still requires users to specify Client ID prefixes with <code>--clientID &lt;Client ID prefix&gt;</code> and takes care to unsubscribe the consumers on test completion.</p>
</div>
<div class="paragraph">
<p>The complete commands now looks like:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
<span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id
<span class="c"># after few seconds</span>
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 74842 msg/sec
<span class="nt">---</span> blocked: 2702 msg/sec
<span class="nt">---</span> completed: 74641 msg/sec
<span class="nt">---</span> received: 146412 msg/sec
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 37366.13 us - 50.00%: 37119.00 us - 90.00%: 46079.00 us - 99.00%: 68095.00 us - 99.90%: 84479.00 us - 99.99%: 94719.00 us - max: 95743.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 44060.66 us - 50.00%: 43263.00 us - 90.00%: 54527.00 us - 99.00%: 75775.00 us - 99.90%: 87551.00 us - 99.99%: 91135.00 us - max: 91135.00 us
<span class="c"># CTRL + C</span>
<span class="nt">---</span> SUMMARY
<span class="nt">---</span> result: success
<span class="nt">---</span> total sent: 2377653
<span class="nt">---</span> total blocked: 80004
<span class="nt">---</span> total completed: 2377653
<span class="nt">---</span> total received: 4755306
<span class="nt">---</span> aggregated send <span class="nb">time</span>: mean: 39423.69 us - 50.00%: 38911.00 us - 90.00%: 49663.00 us - 99.00%: 66047.00 us - 99.90%: 85503.00 us - 99.99%: 101887.00 us - max: 115711.00 us
<span class="nt">---</span> aggregated transfer <span class="nb">time</span>: mean: 46216.99 us - 50.00%: 45311.00 us - 90.00%: 57855.00 us - 99.00%: 78335.00 us - 99.90%: 97791.00 us - 99.99%: 113151.00 us - max: 125439.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p>Results shows that <code>tranfer time</code> isn&#8217;t queuing up, meaning that subscribers are capable to keep-up with the producers: hence a reasonable rate to test could be ~80% of the perceived <code>sent</code> rate ie <code>--rate 60000</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
<span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span>
<span class="nt">--rate</span> 60000
<span class="c"># after many seconds while running</span>
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 55211 msg/sec
<span class="nt">---</span> blocked: 2134 msg/sec
<span class="nt">---</span> completed: 54444 msg/sec
<span class="nt">---</span> received: 111622 msg/sec
<span class="nt">---</span> send delay <span class="nb">time</span>: mean: 6306710.04 us - 50.00%: 6094847.00 us - 90.00%: 7766015.00 us - 99.00%: 8224767.00 us - 99.90%: 8257535.00 us - 99.99%: 8257535.00 us - max: 8257535.00 us
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 50072.92 us - 50.00%: 50431.00 us - 90.00%: 57855.00 us - 99.00%: 65023.00 us - 99.90%: 71167.00 us - 99.99%: 71679.00 us - max: 71679.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 63672.92 us - 50.00%: 65535.00 us - 90.00%: 78847.00 us - 99.00%: 86015.00 us - 99.90%: 90623.00 us - 99.99%: 93183.00 us - max: 94719.00 us
<span class="c"># it won't get any better :(</span></code></pre>
</div>
</div>
<div class="paragraph">
<p>What&#8217;s wrong with the <code>send delay time</code>?
+ Results show that the load generator cannot keep up with the expected rate and it&#8217;s accumulating a huge delay on the expected scheduled load: lets trying fixing it by adding more producers threads, adding</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash"><span class="nt">--threads</span> &lt;producer threads&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>By using two producers threads, the command now looks like:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight nowrap"><code data-lang="bash">./artemis perf client <span class="nt">--warmup</span> 20 <span class="nt">--max-pending</span> 100 <span class="nt">--show-latency</span> <span class="nt">--url</span> tcp://localhost:61616?confirmationWindowSize<span class="o">=</span>20000 <span class="nt">--consumer-url</span> tcp://localhost:61616 <span class="se">\</span>
<span class="nt">--producers</span> 3 <span class="nt">--consumers</span> 2 <span class="nt">--num-destinations</span> 10 <span class="nt">--durable</span> <span class="nt">--persistent</span> topic://DURABLE_TOPIC <span class="nt">--clientID</span> test_id <span class="se">\</span>
<span class="nt">--rate</span> 60000 <span class="nt">--threads</span> 2
<span class="c"># after few seconds warming up....</span>
<span class="nt">---</span> warmup <span class="nb">false</span>
<span class="nt">---</span> sent: 59894 msg/sec
<span class="nt">---</span> blocked: 694 msg/sec
<span class="nt">---</span> completed: 58925 msg/sec
<span class="nt">---</span> received: 114857 msg/sec
<span class="nt">---</span> send delay <span class="nb">time</span>: mean: 3189.96 us - 50.00%: 277.00 us - 90.00%: 10623.00 us - 99.00%: 35583.00 us - 99.90%: 47871.00 us - 99.99%: 56063.00 us - max: 58367.00 us
<span class="nt">---</span> send ack <span class="nb">time</span>: mean: 31500.93 us - 50.00%: 31231.00 us - 90.00%: 48383.00 us - 99.00%: 65535.00 us - 99.90%: 83455.00 us - 99.99%: 95743.00 us - max: 98303.00 us
<span class="nt">---</span> transfer <span class="nb">time</span>: mean: 38151.21 us - 50.00%: 37119.00 us - 90.00%: 55807.00 us - 99.00%: 84479.00 us - 99.90%: 104959.00 us - 99.99%: 118271.00 us - max: 121855.00 us</code></pre>
</div>
</div>
<div class="paragraph">
<p><code>send delay time</code> now seems under control, meaning that the load generator need some tuning in order to work at its best.</p>
</div>
</div>
</div>
</div>
</body>
</html>