blob: 833f3907d22098a154be5f746d30009763f21332 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="author" content="Apache Software Foundation">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Gobblin Metrics Performance - Apache Gobblin</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link href="../../css/extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Gobblin Metrics Performance";
var mkdocs_page_input_path = "metrics/Gobblin-Metrics-Performance.md";
var mkdocs_page_url = null;
</script>
<script src="../../js/jquery-2.1.1.min.js" defer></script>
<script src="../../js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Apache Gobblin</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="/">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Powered-By/">Companies Powered By Gobblin</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Getting-Started/">Getting Started</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Gobblin-Architecture/">Architecture</a>
</li>
<li class="toctree-l1">
<span class="caption-text">User Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../../user-guide/Working-with-Job-Configuration-Files/">Job Configuration Files</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-Deployment/">Deployment</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-as-a-Library/">Gobblin as a Library</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-CLI/">Gobblin CLI</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-Compliance/">Gobblin Compliance</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-on-Yarn/">Gobblin on Yarn</a>
</li>
<li class="">
<a class="" href="../../user-guide/Compaction/">Compaction</a>
</li>
<li class="">
<a class="" href="../../user-guide/State-Management-and-Watermarks/">State Management and Watermarks</a>
</li>
<li class="">
<a class="" href="../../user-guide/Working-with-the-ForkOperator/">Fork Operator</a>
</li>
<li class="">
<a class="" href="../../user-guide/Configuration-Properties-Glossary/">Configuration Glossary</a>
</li>
<li class="">
<a class="" href="../../user-guide/Source-schema-and-Converters/">Source schema and Converters</a>
</li>
<li class="">
<a class="" href="../../user-guide/Partitioned-Writers/">Partitioned Writers</a>
</li>
<li class="">
<a class="" href="../../user-guide/Monitoring/">Monitoring</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-template/">Template</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-Schedulers/">Schedulers</a>
</li>
<li class="">
<a class="" href="../../user-guide/Job-Execution-History-Store/">Job Execution History Store</a>
</li>
<li class="">
<a class="" href="../../user-guide/Building-Gobblin/">Building Gobblin</a>
</li>
<li class="">
<a class="" href="../../user-guide/Gobblin-genericLoad/">Generic Configuration Loading</a>
</li>
<li class="">
<a class="" href="../../user-guide/Hive-Registration/">Hive Registration</a>
</li>
<li class="">
<a class="" href="../../user-guide/Config-Management/">Config Management</a>
</li>
<li class="">
<a class="" href="../../user-guide/Docker-Integration/">Docker Integration</a>
</li>
<li class="">
<a class="" href="../../user-guide/Troubleshooting/">Troubleshooting</a>
</li>
<li class="">
<a class="" href="../../user-guide/FAQs/">FAQs</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sources</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sources/AvroFileSource/">Avro files</a>
</li>
<li class="">
<a class="" href="../../sources/CopySource/">File copy</a>
</li>
<li class="">
<a class="" href="../../sources/QueryBasedSource/">Query based</a>
</li>
<li class="">
<a class="" href="../../sources/RestApiSource/">Rest Api</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleAnalyticsSource/">Google Analytics</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleDriveSource/">Google Drive</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleWebmaster/">Google Webmaster</a>
</li>
<li class="">
<a class="" href="../../sources/HadoopTextInputSource/">Hadoop Text Input</a>
</li>
<li class="">
<a class="" href="../../sources/HelloWorldSource/">Hello World</a>
</li>
<li class="">
<a class="" href="../../sources/HiveAvroToOrcSource/">Hive Avro-to-ORC</a>
</li>
<li class="">
<a class="" href="../../sources/HivePurgerSource/">Hive compliance purging</a>
</li>
<li class="">
<a class="" href="../../sources/SimpleJsonSource/">JSON</a>
</li>
<li class="">
<a class="" href="../../sources/KafkaSource/">Kafka</a>
</li>
<li class="">
<a class="" href="../../sources/MySQLSource/">MySQL</a>
</li>
<li class="">
<a class="" href="../../sources/OracleSource/">Oracle</a>
</li>
<li class="">
<a class="" href="../../sources/SalesforceSource/">Salesforce</a>
</li>
<li class="">
<a class="" href="../../sources/SftpSource/">SFTP</a>
</li>
<li class="">
<a class="" href="../../sources/SqlServerSource/">SQL Server</a>
</li>
<li class="">
<a class="" href="../../sources/TeradataSource/">Teradata</a>
</li>
<li class="">
<a class="" href="../../sources/WikipediaSource/">Wikipedia</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sinks (Writers)</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sinks/AvroHdfsDataWriter/">Avro HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/ParquetHdfsDataWriter/">Parquet HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/SimpleBytesWriter/">HDFS Byte array</a>
</li>
<li class="">
<a class="" href="../../sinks/ConsoleWriter/">Console</a>
</li>
<li class="">
<a class="" href="../../sinks/CouchbaseWriter/">Couchbase</a>
</li>
<li class="">
<a class="" href="../../sinks/Http/">HTTP</a>
</li>
<li class="">
<a class="" href="../../sinks/Gobblin-JDBC-Writer/">JDBC</a>
</li>
<li class="">
<a class="" href="../../sinks/Kafka/">Kafka</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Adaptors</span>
<ul class="subnav">
<li class="">
<a class="" href="../../adaptors/Gobblin-Distcp/">Gobblin Distcp</a>
</li>
<li class="">
<a class="" href="../../adaptors/Hive-Avro-To-ORC-Converter/">Hive Avro-To-Orc Converter</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Case Studies</span>
<ul class="subnav">
<li class="">
<a class="" href="../../case-studies/Kafka-HDFS-Ingestion/">Kafka-HDFS Ingestion</a>
</li>
<li class="">
<a class="" href="../../case-studies/Publishing-Data-to-S3/">Publishing Data to S3</a>
</li>
<li class="">
<a class="" href="../../case-studies/Writing-ORC-Data/">Writing ORC Data</a>
</li>
<li class="">
<a class="" href="../../case-studies/Hive-Distcp/">Hive Distcp</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Data Management</span>
<ul class="subnav">
<li class="">
<a class="" href="../../data-management/Gobblin-Retention/">Retention</a>
</li>
<li class="">
<a class="" href="../../data-management/DistcpNgEvents/">Distcp-NG events</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Metrics</span>
<ul class="subnav">
<li class="">
<a class="" href="../Gobblin-Metrics/">Quick Start</a>
</li>
<li class="">
<a class="" href="../Existing-Reporters/">Existing Reporters</a>
</li>
<li class="">
<a class="" href="../Metrics-for-Gobblin-ETL/">Metrics for Gobblin ETL</a>
</li>
<li class="">
<a class="" href="../Gobblin-Metrics-Architecture/">Gobblin Metrics Architecture</a>
</li>
<li class="">
<a class="" href="../Implementing-New-Reporters/">Implementing New Reporters</a>
</li>
<li class=" current">
<a class="current" href="./">Gobblin Metrics Performance</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#table-of-contents">Table of Contents</a></li>
<li class="toctree-l3"><a href="#generalities">Generalities</a></li>
<li class="toctree-l3"><a href="#how-to-interpret-these-numbers">How to interpret these numbers</a></li>
<ul>
<li><a class="toctree-l4" href="#what-if-i-need-larger-qps">What if I need larger QPS?</a></li>
</ul>
<li class="toctree-l3"><a href="#update-metrics-performance">Update Metrics Performance</a></li>
<ul>
<li><a class="toctree-l4" href="#multiple-metric-updates-per-iteration">Multiple metric updates per iteration</a></li>
<li><a class="toctree-l4" href="#multi-threading">Multi-threading</a></li>
<li><a class="toctree-l4" href="#running-performance-tests">Running Performance Tests</a></li>
</ul>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Developer Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../../developer-guide/Customization-for-New-Source/">Customization for New Source</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Customization-for-Converter-and-Operator/">Customization for Converter and Operator</a>
</li>
<li class="">
<a class="" href="../../developer-guide/CodingStyle/">Code Style Guide</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Gobblin-Compliance-Design/">Gobblin Compliance Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/IDE-setup/">IDE setup</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Monitoring-Design/">Monitoring Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Documentation-Architecture/">Documentation Architecture</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Contributing/">Contributing</a>
</li>
<li class="">
<a class="" href="../../developer-guide/GobblinModules/">Gobblin Modules</a>
</li>
<li class="">
<a class="" href="../../developer-guide/HighLevelConsumer/">High Level Consumer</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Project</span>
<ul class="subnav">
<li class="">
<a class="" href="../../project/Feature-List/">Feature List</a>
</li>
<li class="">
<a class="" href="/people">Contributors and Team</a>
</li>
<li class="">
<a class="" href="../../project/Talks-and-Tech-Blogs/">Talks and Tech Blog Posts</a>
</li>
<li class="">
<a class="" href="../../project/Posts/">Posts</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Miscellaneous</span>
<ul class="subnav">
<li class="">
<a class="" href="../../miscellaneous/Camus-to-Gobblin-Migration/">Camus to Gobblin Migration</a>
</li>
<li class="">
<a class="" href="../../miscellaneous/Exactly-Once-Support/">Exactly Once Support</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Apache Gobblin</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>Gobblin Metrics &raquo;</li>
<li>Gobblin Metrics Performance</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/apache/incubator-gobblin/edit/master/docs/metrics/Gobblin-Metrics-Performance.md" rel="nofollow"> Edit on Gobblin</a>
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h2 id="table-of-contents">Table of Contents</h2>
<div class="toc">
<ul>
<li><a href="#table-of-contents">Table of Contents</a></li>
<li><a href="#generalities">Generalities</a></li>
<li><a href="#how-to-interpret-these-numbers">How to interpret these numbers</a><ul>
<li><a href="#what-if-i-need-larger-qps">What if I need larger QPS?</a></li>
</ul>
</li>
<li><a href="#update-metrics-performance">Update Metrics Performance</a><ul>
<li><a href="#multiple-metric-updates-per-iteration">Multiple metric updates per iteration</a></li>
<li><a href="#multi-threading">Multi-threading</a></li>
<li><a href="#running-performance-tests">Running Performance Tests</a></li>
</ul>
</li>
</ul>
</div>
<h1 id="generalities">Generalities</h1>
<p>These are the main resources used by Gobblin Metrics:</p>
<ul>
<li>CPU time for updating metrics: scales with number of metrics and frequency of metric update</li>
<li>CPU time for metric emission and lifecycle management: scales with number of metrics and frequency of emission</li>
<li>Memory for storing metrics: scales with number of metrics and metric contexts</li>
<li>I/O for reporting metrics: scales with number of metrics and frequency of emission</li>
<li>External resources for metrics emission (e.g. HDFS space, Kafka queue space, etc.): scales with number of metrics and frequency of emission</li>
</ul>
<p>This page focuses on the CPU time for updating metrics, as these updates are usually in the critical performance path of an application. Each metric requires bounded memory, and having a few metrics should have no major effect on memory usage. Metrics and Metric Contexts are cleaned when no longer needed to further reduce this impact. Resources related to metric emission can always be reduced by reporting fewer metrics or decreasing the reporting frequency when necessary.</p>
<h1 id="how-to-interpret-these-numbers">How to interpret these numbers</h1>
<p>This document provides maximum QPS achievable by Gobblin Metrics. If the application attempts to update metrics at a higher rate than this, the metrics will effectively throttle the application. If, on the other hand, the application only updates metrics at 10% or less of the maximum QPS, the performance impact of Gobblin Metrics should be minimal.</p>
<h3 id="what-if-i-need-larger-qps">What if I need larger QPS?</h3>
<p>If your application needs larger QPS, the recommendation is to batch metrics updates. Counters and Meters offer the option to increase their values by multiple units at a time. Histograms and Timers don't offer this option, but for very high throughput applications, randomly registering for example only 10% of the values will not affect statistics significantly (although you will have to adjust timer and histogram counts manually).</p>
<h1 id="update-metrics-performance">Update Metrics Performance</h1>
<p>Metric updates are the most common interaction with Gobblin Metrics in an application. Every time a counter is increased, a meter is marked, or entries are added to histograms and timers, an update happens. As such, metric updates are the most likely to impact application performance.</p>
<p>We measured the max number of metric updates that can be executed per second. The performance of different metric types is different. Also, the performance of metrics depends on the depth in the Metric Context tree at which they are created. Metrics in the Root Metric Context are the fastest, while metrics deep in the tree are slower because they have to update all ancestors as well. The following table shows reference max QPS in updates per second as well as the equivalent single update delay in nanoseconds for each metric type in a i7 processor:</p>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Root level</th>
<th>Depth: 1</th>
<th>Depth: 2</th>
<th>Depth: 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Counter</td>
<td>76M (13ns)</td>
<td>39M (25ns)</td>
<td>29M (34ns)</td>
<td>24M (41ns)</td>
</tr>
<tr>
<td>Meter</td>
<td>11M (90ns)</td>
<td>7M (142ns)</td>
<td>4.5M (222ns)</td>
<td>3.5M (285ns)</td>
</tr>
<tr>
<td>Histogram</td>
<td>2.4M (416ns)</td>
<td>2.4M (416ns)</td>
<td>1.8M (555ns)</td>
<td>1.3M (769ns)</td>
</tr>
<tr>
<td>Timer</td>
<td>1.4M (714ns)</td>
<td>1.4M (714ns)</td>
<td>1M (1us)</td>
<td>1M (1us)</td>
</tr>
</tbody>
</table>
<h2 id="multiple-metric-updates-per-iteration">Multiple metric updates per iteration</h2>
<p>If a single thread updates multiple metrics, the average delay for metric updates will be the sum of the delays of each metric independently. For example, if each iteration the application is updating two counters, one timer, and one histogram at the root metric context level, the total delay will be <code>13ns + 13ns + 416ns + 714ns = 1156ns</code> for a max QPS of <code>865k</code>.</p>
<h2 id="multi-threading">Multi-threading</h2>
<p>Updating metrics with different names can be parallelized efficiently, e.g. different threads updating metrics with different names will not interfere with each other. However, multiple threads updating metrics with the same names will interfere with each other, as the updates of common ancestor metrics are synchronized (to provide with auto-aggregation). In experiments we observed that updating metrics with the same name from multiple threads increases the maximum QPS sub-linearly, saturating at about 3x the single threaded QPS, i.e. the total QPS of metrics updates across any number of threads will not go about 3x the numbers shown in the table above.</p>
<p>On the other hand, if each thread is updating multiple metrics, the updates might interleave with each other, potentially increasing the max total QPS. In the example with two counters, one timer, and one histogram, one thread could be updating the timer while another could be updating the histogram, reducing interference, but never exceeding the max QPS of the single most expensive metric. Note that there is no optimization in code to produce this interleaving, it is merely an effect of synchronization, so the effect might vary.</p>
<h2 id="running-performance-tests">Running Performance Tests</h2>
<p>To run the performance tests</p>
<pre><code class="bash">cd gobblin-metrics
../gradlew performance
</code></pre>
<p>After finishing, it should create a TestNG report at <code>build/gobblin-metrics/reports/tests/packages/gobblin.metrics.performance.html</code>. Nicely printed performance results are available on the Output tab. </p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../../developer-guide/Customization-for-New-Source/" class="btn btn-neutral float-right" title="Customization for New Source">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../Implementing-New-Reporters/" class="btn btn-neutral" title="Implementing New Reporters"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org" rel="nofollow">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme" rel="nofollow">theme</a> provided by <a href="https://readthedocs.org" rel="nofollow">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../Implementing-New-Reporters/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../../developer-guide/Customization-for-New-Source/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js" defer></script>
<script src="../../js/extra.js" defer></script>
<script src="../../search/main.js" defer></script>
</body>
</html>