blob: f7d95ef32a2322d348f0965ed72fa1c3a63df8d2 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/markdown/metron-contrib/metron-performance/index.md at 2019-05-14
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20190514" />
<meta http-equiv="Content-Language" content="en" />
<title>Metron &#x2013; Performance Utilities</title>
<link rel="stylesheet" href="../../css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="../../css/site.css" />
<link rel="stylesheet" href="../../css/print.css" media="print" />
<script type="text/javascript" src="../../js/apache-maven-fluido-1.7.min.js"></script>
<script type="text/javascript">
$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );
</script>
</head>
<body class="topBarDisabled">
<div class="container-fluid">
<div id="banner">
<div class="pull-left"><a href="http://metron.apache.org/" id="bannerLeft"><img src="../../images/metron-logo.png" alt="Apache Metron" width="148px" height="48px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="http://metron.apache.org/" class="externalLink" title="Metron">Metron</a><span class="divider">/</span></li>
<li class=""><a href="../../index.html" title="Documentation">Documentation</a><span class="divider">/</span></li>
<li class="active ">Performance Utilities</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-05-14</li>
<li id="projectVersion" class="pull-right">Version: 0.7.1</li>
</ul>
</div>
<div class="row-fluid">
<div id="leftColumn" class="span2">
<div class="well sidebar-nav">
<ul class="nav nav-list">
<li class="nav-header">User Documentation</li>
<li><a href="../../index.html" title="Metron"><span class="icon-chevron-down"></span>Metron</a>
<ul class="nav nav-list">
<li><a href="../../CONTRIBUTING.html" title="CONTRIBUTING"><span class="none"></span>CONTRIBUTING</a></li>
<li><a href="../../Upgrading.html" title="Upgrading"><span class="none"></span>Upgrading</a></li>
<li><a href="../../metron-analytics/index.html" title="Analytics"><span class="icon-chevron-right"></span>Analytics</a></li>
<li><a href="../../metron-contrib/metron-docker/index.html" title="Docker"><span class="none"></span>Docker</a></li>
<li class="active"><a href="#"><span class="none"></span>Performance</a></li>
<li><a href="../../metron-deployment/index.html" title="Deployment"><span class="icon-chevron-right"></span>Deployment</a></li>
<li><a href="../../metron-interface/index.html" title="Interface"><span class="icon-chevron-right"></span>Interface</a></li>
<li><a href="../../metron-platform/index.html" title="Platform"><span class="icon-chevron-right"></span>Platform</a></li>
<li><a href="../../metron-sensors/index.html" title="Sensors"><span class="icon-chevron-right"></span>Sensors</a></li>
<li><a href="../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example"><span class="none"></span>Stellar-3rd-party-example</a></li>
<li><a href="../../metron-stellar/stellar-common/index.html" title="Stellar-common"><span class="icon-chevron-right"></span>Stellar-common</a></li>
<li><a href="../../metron-stellar/stellar-zeppelin/index.html" title="Stellar-zeppelin"><span class="none"></span>Stellar-zeppelin</a></li>
<li><a href="../../use-cases/index.html" title="Use-cases"><span class="icon-chevron-right"></span>Use-cases</a></li>
</ul>
</li>
</ul>
<hr />
<div id="poweredBy">
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" /></a>
</div>
</div>
</div>
<div id="bodyColumn" class="span10" >
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h1>Performance Utilities</h1>
<p><a name="Performance_Utilities"></a></p>
<p>This project creates some useful performance monitoring and measurement utilities.</p>
<div class="section">
<h2><a name="load-tool.sh"></a><tt>load-tool.sh</tt></h2>
<p>The Load tool is intended to do the following:</p>
<ul>
<li>Generate a load at a specific events per second into kafka
<ul>
<li>The messages are taken from a template file, where there is a message template per line</li>
<li>The load can be biased (e.g. 80% of the load can be comprised of 20% of the templates)</li>
</ul>
</li>
<li>Monitor the kafka offsets for a topic to determine the events per second written
<ul>
<li>This could be the topic that you are generating load on</li>
<li>This could be another topic that represents the output of some topology (e.g. generate load on <tt>enrichments</tt> and monitor <tt>indexing</tt> to determine the throughput of the enrichment topology).</li>
</ul>
</li>
</ul>
<div>
<div>
<pre class="source">usage: Generator
-bs,--sample_bias &lt;BIAS_FILE&gt; The discrete distribution to bias
the sampling. This is a CSV of 2
columns. The first column is the %
of the templates and the 2nd column
is the probability (0-100) that
it's chosen. For instance:
20,80
80,20
implies that 20% of the templates
will comprise 80% of the output and
the remaining 80% of the templates
will comprise 20% of the output.
-c,--csv &lt;CSV_FILE&gt; A CSV file to emit monitoring data
to. The format is a CSV with the
following schema: timestamp, (name,
eps, historical_mean,
historical_stddev)+
-cg,--consumer_group &lt;GROUP_ID&gt; Consumer Group. The default is
load.group
-e,--eps &lt;EPS&gt; The target events per second
-h,--help Generate Help screen
-k,--kafka_config &lt;CONFIG_FILE&gt; The kafka config. This is a file
containing a JSON map with the
kafka config.
-l,--lookback &lt;LOOKBACK&gt; When summarizing, how many
monitoring periods should we
summarize over? If 0, then no
summary. Default: 5
-md,--monitor_delta_ms &lt;TIME_IN_MS&gt; The time (in ms) between monitoring
output. Default is 10000
-mt,--monitor_topic &lt;TOPIC&gt; The kafka topic to monitor.
-ot,--output_topic &lt;TOPIC&gt; The kafka topic to write to
-p,--threads &lt;NUM_THREADS&gt; The number of threads to use when
extracting data. The default is
the number of cores of your
machine.
-sd,--send_delta_ms &lt;TIME_IN_MS&gt; The time (in ms) between sending a
batch of messages. Default is 100
-t,--template &lt;TEMPLATE_FILE&gt; The template file to use for
generation. This should be a file
with a template per line with
$METRON_TS and $METRON_GUID in the
spots for timestamp and guid, if
you so desire them.
-tl,--time_limit_ms &lt;MS&gt; The total amount of time to run
this in milliseconds. By default,
it never stops.
-z,--zk_quorum &lt;QUORUM&gt; zookeeper quorum
</pre></div></div>
</div>
<div class="section">
<h2><a name="Templates"></a>Templates</h2>
<p>Messages are drawn from a template file. A template file has a message template per line.<br />
For instance, let&#x2019;s say we want to generate JSON maps with fields: <tt>source.type</tt>, <tt>ip_src_addr</tt> and <tt>ip_dst_addr</tt>. We can generate a template file with a template like the following per line:</p>
<div>
<div>
<pre class="source">{ &quot;source.type&quot; : &quot;asa&quot;, &quot;ip_src_addr&quot; : &quot;127.0.0.1&quot;, &quot;ip_dst_addr&quot; : &quot;191.168.1.1&quot; }
</pre></div></div>
<p>When messages are generated, there are some special replacements that can be used: <tt>$METRON_TS</tt> and <tt>$METRON_GUID</tt>. We can adjust our previous template to use these like so:</p>
<div>
<div>
<pre class="source">{ &quot;source.type&quot; : &quot;asa&quot;, &quot;ip_src_addr&quot; : &quot;127.0.0.1&quot;, &quot;ip_dst_addr&quot; : &quot;191.168.1.1&quot;, &quot;timestamp&quot; : $METRON_TS, &quot;guid&quot; : &quot;$METRON_GUID&quot; }
</pre></div></div>
<p>One note about GUIDs generated. We do not generate global UUIDs, they are unique only within the context of a given generator run.</p></div>
<div class="section">
<h2><a name="Biased_Sampling"></a>Biased Sampling</h2>
<p>This load tool can be configured to use biased sampling. This is useful if, for instance, you are trying to model data which is not distributed uniformly, like many types of network data. Generating synthetic data with similar distribution to your regular data will enable the caches to be exercised in the same way, for instance, and yield a more realistic scenario.</p>
<p>You specify the biases in a csv file with 2 columns:</p>
<ul>
<li>The first column represents the % of the templates</li>
<li>The second column represents the % of the generated output.</li>
</ul>
<p>A simple example would be to generate samples based on Pareto&#x2019;s principle:</p>
<div>
<div>
<pre class="source">20,80
80,20
</pre></div></div>
<p>This would yield biases that mean the first 20% of the templates in the template file would comprise 80% of the output.</p>
<p>A more complex example might be:</p>
<div>
<div>
<pre class="source">20,80
20,5
50,1
10,14
</pre></div></div>
<p>This would would imply:</p>
<ul>
<li>The first 20% of the templates would comprise 80% of the output</li>
<li>The next 20% of the templates would comprise 5% of the output</li>
<li>The next 50% of the templates would comprise 1% of the output</li>
<li>The next 10% of the templates would comprise 14% of the output.</li>
</ul></div>
<div class="section">
<h2><a name="CSV_Output"></a>CSV Output</h2>
<p>For those who would prefer a different visualization or wish to incorporate the output of this tool into an automated test, you can specify a file to emit data in CSV format to via the <tt>-c</tt> or <tt>--csv</tt> option.</p>
<p>The CSV columns are as follows:</p>
<ul>
<li>timestamp in epoch millis</li>
</ul>
<p>If you are generating synthetic data, then:</p>
<ul>
<li>&#x201c;generated&#x201d;</li>
<li>The events per second generated</li>
<li>The mean of the events per second generated for the the last <tt>k</tt> runs, where <tt>k</tt> is the lookback (set via <tt>-l</tt> and defaulted to <tt>5</tt>)</li>
<li>The standard deviation of the events per second generated for the last <tt>k</tt> runs, where <tt>k</tt> is the lookback (set via <tt>-l</tt> and defaulted to <tt>5</tt>)</li>
</ul>
<p>If you are monitoring a topic, then:</p>
<ul>
<li>&#x201c;throughput measured&#x201d;</li>
<li>The events per second measured</li>
<li>The mean of the events per second measured for the the last <tt>k</tt> runs, where <tt>k</tt> is the lookback (set via <tt>-l</tt> and defaulted to <tt>5</tt>)</li>
<li>The standard deviation of the events per second measured for the last <tt>k</tt> runs, where <tt>k</tt> is the lookback (set via <tt>-l</tt> and defaulted to <tt>5</tt>)</li>
</ul>
<p>Obviously, if you are doing both generating and monitoring the throughput of a topic, then all of the columns are added.</p>
<p>An example of CSV output is:</p>
<div>
<div>
<pre class="source">1520506955047,generated,,,,throughput measured,,,
1520506964896,generated,1045,1045,0,throughput measured,,,
1520506974896,generated,1000,1022,31,throughput measured,1002,1002,0
1520506984904,generated,999,1014,26,throughput measured,999,1000,2
1520506994896,generated,1000,1011,22,throughput measured,1000,1000,1
1520507004896,generated,1000,1008,20,throughput measured,1000,1000,1
</pre></div></div>
</div>
<div class="section">
<h2><a name="Use-cases_for_the_Load_Tool"></a>Use-cases for the Load Tool</h2>
<div class="section">
<h3><a name="Measure_Throughput_of_a_Topology"></a>Measure Throughput of a Topology</h3>
<p>One can use the load tool to monitor performance of a kafka-to-kafka topology. For instance, we could monitor the throughput of the enrichment topology by monitoring the <tt>enrichments</tt> kafka topic:</p>
<div>
<div>
<pre class="source">$METRON_HOME/bin/load_tool.sh -mt enrichments -z $ZOOKEEPER
</pre></div></div>
</div>
<div class="section">
<h3><a name="Generate_Synthetic_Load_and_Measure_Performance"></a>Generate Synthetic Load and Measure Performance</h3>
<p>One can use the load tool to generate synthetic load and monitor performance of a kafka-to-kafka topology. For instance, we could monitor the performance of the enrichment topology. It is advised to start the enrichment topology against a new topic and write to a new topic so as to not pollute your downstream indices. So, for instance we could create a kafka topic called <tt>enrichments_load</tt> by generating load on it. We could also create a new kafka topic called <tt>indexing_load</tt> and configure the enrichment topology to output to it. We would then generate load on <tt>enrichments_load</tt> and monitor <tt>indexing_load</tt>.</p>
<div>
<div>
<pre class="source">#Threadpool of size 5, you want somewhere between 5 and 10 depending on the throughput numbers you're trying to drive
#Messages drawn from ~/dummy.templates, which is a message template per line
#Generate at a rate of 9000 messages per second
#Emit the data to a CSV file ~/measurements.csv
$METRON_HOME/bin/load_tool.sh -p 5 -ot enrichments_load -mt indexing_load -t ~/dummy.templates -eps 9000 -z $ZOOKEEPER -c ~/measurements.csv
</pre></div></div>
<p>Now, with the help of a bash function and gnuplot we can generate a plot of the historical throughput measurements for <tt>indexing_load</tt>:</p>
<div>
<div>
<pre class="source"># Ensure that you have installed gnuplot and the liberation font package
# via yum install -y gnuplot liberation-sans-fonts
# We will define a plot function that will generate a png plot. It takes
# one arg, the output file. It expects to have a 2 column CSV streamed
# with the first dimension being the timestamp and the second dimension
# being what you want plotted.
plot() {
awk -F, '{printf &quot;%d %d\n&quot;, $1/1000, $2} END { print &quot;e&quot; }' | gnuplot -e &quot;reset;clear;set style fill solid 1.0 border -1; set nokey;set title 'Throughput Measured'; set xlabel 'Time'; set boxwidth 0.5; set xtics rotate; set ylabel 'events/sec';set xdata time; set timefmt '%s';set format x '%H:%M:%S';set term png enhanced font '/usr/share/fonts/liberation/LiberationSans-Regular.ttf' 12 size 900,400; set output '$1';plot '&lt; cat -' using 1:2 with line lt -1 lw 2;&quot;
}
# We want to transform the CSV file into a space separated file with the
# timestamp followed by the throughput measurements.
cat ~/measurements.csv | awk -F, '{printf &quot;%d,%d\n&quot;, $1, $8 }' | plot performance_measurement.png
</pre></div></div>
<p>This generates a plot like so to <tt>performance_measurement.png</tt>: <img src="../../images/performance_measurement.png" alt="Performance Measurement" /></p></div></div>
</div>
</div>
</div>
<hr/>
<footer>
<div class="container-fluid">
<div class="row-fluid">
© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo,
and the Apache Metron project logo are trademarks of The Apache Software Foundation.
</div>
</div>
</footer>
</body>
</html>