blob: a0fe4f3ff4274544dc254a805a9f01ee8f6f6b3d [file] [log] [blame]
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>IO</title>
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta name="generator" content="Jekyll v4.3.4">
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
<link rel="stylesheet" href="/css/screen.css">
<link rel="icon" type="image/x-icon" href="/favicon.ico">
<!--[if lt IE 9]>
<script src="/js/html5shiv.min.js"></script>
<script src="/js/respond.min.js"></script>
<![endif]-->
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(["setDoNotTrack", true]);
_paq.push(["disableCookies"]);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '68']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
</head>
<body class="wrap">
<header role="banner">
<nav class="mobile-nav show-on-mobiles">
<ul>
<li class="">
<a href="/">Home</a>
</li>
<li class="">
<a href="/releases/"><span class="show-on-mobiles">Rel</span>
<span class="hide-on-mobiles">Releases</span></a>
</li>
<li class="">
<a href="/docs/"><span class="show-on-mobiles">Doc</span>
<span class="hide-on-mobiles">Documentation</span></a>
</li>
<li class="">
<a href="/talks/"><span class="show-on-mobiles">Talk</span>
<span class="hide-on-mobiles">Talks</span></a>
</li>
<li class="">
<a href="/news/">News</a>
</li>
<li class="current">
<a href="/develop/"><span class="show-on-mobiles">Dev</span>
<span class="hide-on-mobiles">Develop</span></a>
</li>
<li class="">
<a href="/help/">Help</a>
</li>
</ul>
</nav>
<div class="grid">
<div class="unit one-quarter center-on-mobiles">
<h1>
<a href="/">
<span class="sr-only">Apache ORC</span>
<img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
</a>
</h1>
</div>
<nav class="main-nav unit three-quarters hide-on-mobiles">
<ul>
<li class="">
<a href="/">Home</a>
</li>
<li class="">
<a href="/releases/"><span class="show-on-mobiles">Rel</span>
<span class="hide-on-mobiles">Releases</span></a>
</li>
<li class="">
<a href="/docs/"><span class="show-on-mobiles">Doc</span>
<span class="hide-on-mobiles">Documentation</span></a>
</li>
<li class="">
<a href="/talks/"><span class="show-on-mobiles">Talk</span>
<span class="hide-on-mobiles">Talks</span></a>
</li>
<li class="">
<a href="/news/">News</a>
</li>
<li class="current">
<a href="/develop/"><span class="show-on-mobiles">Dev</span>
<span class="hide-on-mobiles">Develop</span></a>
</li>
<li class="">
<a href="/help/">Help</a>
</li>
</ul>
</nav>
</div>
</header>
<section class="standalone">
<div class="grid">
<div class="unit whole">
<article>
<h1>IO</h1>
<ul>
<li><a href="#Background">Background</a>
<ul>
<li><a href="#SeekvsRead">Seek vs Read</a></li>
<li><a href="#ORCRead">ORC Read</a></li>
</ul>
</li>
<li><a href="#ReadOptimization">Read Optimization</a>
<ul>
<li><a href="#Approach">Approach</a></li>
<li><a href="#Scope">Scope</a></li>
<li><a href="#Benchmarks">Benchmarks</a>
<ul>
<li><a href="#LocalFS">Local FS</a></li>
<li><a href="#AWSS3">AWS S3</a></li>
</ul>
</li>
<li><a href="#Summary">Summary</a></li>
</ul>
</li>
</ul>
<h2 id="background-">Background <a id="Background"></a></h2>
<p>We are moving our workloads from HDFS to AWS S3. As part of this activity we wanted to understand the performance
characteristics and costs of using S3.</p>
<h3 id="seek-vs-read-">Seek vs Read <a id="SeekvsRead"></a></h3>
<p>One particular scenario that stood out in our performance testing was Seek vs Read when dealing with S3.</p>
<p>In this test we are trying to read through a file</p>
<ul>
<li>Seek to Point A in the file read X bytes</li>
<li>Move to Point B in the file that is A + X + Y
<ul>
<li>This is accomplished as another seek or as a read</li>
<li>We will leave Y variable to determine when this is best</li>
</ul>
</li>
<li>Read X bytes</li>
</ul>
<p><img src="/img/seekvsread.png" alt="Seek vs Read" /></p>
<p>Observations:</p>
<ul>
<li>We could clearly see that a read is more performant than seek when dealing with steps/gaps smaller than 4 MB.
<ul>
<li>At 4 MB read is faster by ~ 11%</li>
<li>At 1 MB read is faster by ~ 20%</li>
</ul>
</li>
<li>Reads are also cheaper as we perform a single GET instead of multiple GETs from <a href="https://aws.amazon.com/s3/pricing/">AWS S3 Pricing</a>
<ul>
<li>Cost for GET: $0.0004</li>
<li>Cost for Data Retrieval to the same region AWS EKS: $0.0000</li>
</ul>
</li>
</ul>
<h3 id="orc-read-">ORC Read <a id="ORCRead"></a></h3>
<p>Based on the above performance penalty when dealing with multiple seeks over small gaps, we measured the performance of
ORC read on a file.</p>
<p>File details:</p>
<ul>
<li>Size ~ 21 MB</li>
<li>Column Count: ~ 400</li>
<li>Row Count: ~ 65K</li>
</ul>
<table>
<thead>
<tr>
<th style="text-align: left">Read Type</th>
<th style="text-align: right">Duration</th>
<th style="text-align: left">Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">All Columns</td>
<td style="text-align: right">1.075</td>
<td style="text-align: left">s</td>
</tr>
<tr>
<td style="text-align: left">Alternate Columns</td>
<td style="text-align: right">6.489</td>
<td style="text-align: left">s</td>
</tr>
</tbody>
</table>
<p>Observations:</p>
<ul>
<li>We can clearly see that we pay a significant penalty when reading alternate columns, which in the current
implementation of ORC translates to multiple GET calls on AWS S3</li>
<li>While the impact of penalty will be less significant in large reads, it will incur overheads both in terms of time and
cost</li>
</ul>
<h2 id="read-optimization-">Read Optimization <a id="ReadOptimization"></a></h2>
<h3 id="approach-">Approach <a id="Approach"></a></h3>
<p>The following optimizations are planned:</p>
<ul>
<li><strong>orc.min.disk.seek.size</strong> is a value in bytes: When trying to determine a single read, if the gap between two reads
is smaller than this then it is combined into a single read.</li>
<li><strong>orc.min.disk.seek.size.tolerance</strong> is a fractional input: If the extra bytes read is greater than this fraction of
the required bytes, then we drop the extra bytes from memory.</li>
<li>We can further consider adding an optimization for the complete stripe in case the stripe size is smaller than
<code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size</code></li>
</ul>
<h3 id="scope-">Scope <a id="Scope"></a></h3>
<p>Different types of IO takes place in ORC today.</p>
<ul>
<li>Reading of File Footer: Unchanged</li>
<li>Reading of Stripe Footer: Unchanged</li>
<li>Reading of Stripe Index information: Optimized</li>
<li>Reading of Stripe Data: Optimized</li>
</ul>
<p>Each of the above happens at different stages of the read. The current implementation optimizes reads that happen using
the <a href="https://github.com/apache/orc/tree/main/java/core/src/java/org/apache/orc/DataReader.java">DataReader</a> interface.</p>
<p>This does not:</p>
<ul>
<li>Optimize the read of the file/stripe footer</li>
<li>Reads across multiple stripes</li>
</ul>
<h3 id="benchmarks-">Benchmarks <a id="Benchmarks"></a></h3>
<h4 id="local-fs-">Local FS <a id="LocalFS"></a></h4>
<p>This benchmark is run on the local filesystem with NVMe SSD, so it has very different performance characteristics to AWS
S3.</p>
<p>The purpose of this benchmark is to ascertain if we have added any significant penalties in the ORC code by adding
<code class="language-plaintext highlighter-rouge">minSeekSize</code> and <code class="language-plaintext highlighter-rouge">extraByteTolerance</code>.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java <span class="nt">-jar</span> java/bench/core/target/orc-benchmarks-core-<span class="k">*</span><span class="nt">-uber</span>.jar chunk_read
</code></pre></div></div>
<table>
<thead>
<tr>
<th style="text-align: left">(alt)</th>
<th style="text-align: right">(cols)</th>
<th style="text-align: right">(byteTol)</th>
<th style="text-align: right">(minSeek)</th>
<th style="text-align: left">Mode</th>
<th style="text-align: right">Cnt</th>
<th style="text-align: right">Score</th>
<th style="text-align: left">Sign</th>
<th style="text-align: right">Error</th>
<th style="text-align: left">Units</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">true</td>
<td style="text-align: right">128</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">0</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.352</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.006</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">true</td>
<td style="text-align: right">128</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.357</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.002</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">true</td>
<td style="text-align: right">128</td>
<td style="text-align: right">10.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.349</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.002</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">false</td>
<td style="text-align: right">128</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">0</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.667</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.007</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">false</td>
<td style="text-align: right">128</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.673</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.004</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">false</td>
<td style="text-align: right">128</td>
<td style="text-align: right">10.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0.671</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.005</td>
<td style="text-align: left">s/op</td>
</tr>
</tbody>
</table>
<p>Observations/Details:</p>
<ul>
<li><strong>Input File details</strong>:
<ul>
<li>Rows: 65536</li>
<li>Columns: 128</li>
<li>FileSize: ~ 72 MB</li>
</ul>
</li>
<li>Full Read (alternate = false)
<ul>
<li>No significant difference between the options as expected</li>
</ul>
</li>
<li>Alternate Read (alternate = true)
<ul>
<li>No significant difference between the options given the small file size and performance of local disk</li>
<li>This also calls out that the recommended minSeekSize should be determined for each platform e.g. HDFS, S3, etc</li>
</ul>
</li>
</ul>
<h4 id="aws-s3-">AWS S3 <a id="AWSS3"></a></h4>
<p>In this benchmark we brought up an EKS Container in the same region as the AWS S3 bucket to test the performance of the
patch.</p>
<table>
<thead>
<tr>
<th style="text-align: left">(alternate)</th>
<th style="text-align: right">(byteTol)</th>
<th style="text-align: right">(minSeekSize)</th>
<th style="text-align: left">Mode</th>
<th style="text-align: right">Cnt</th>
<th style="text-align: right">Score</th>
<th style="text-align: left">Sign</th>
<th style="text-align: right">Error</th>
<th style="text-align: left">Units</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">FALSE</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">0</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">1.837</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.089</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">FALSE</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">1.919</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.11</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">FALSE</td>
<td style="text-align: right">10.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">1.895</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.191</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">TRUE</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">0</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">5.8</td>
<td style="text-align: left">±</td>
<td style="text-align: right">1.132</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">TRUE</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">1.479</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.197</td>
<td style="text-align: left">s/op</td>
</tr>
<tr>
<td style="text-align: left">TRUE</td>
<td style="text-align: right">10.0</td>
<td style="text-align: right">4194304</td>
<td style="text-align: left">avgt</td>
<td style="text-align: right">5</td>
<td style="text-align: right">1.435</td>
<td style="text-align: left">±</td>
<td style="text-align: right">0.176</td>
<td style="text-align: left">s/op</td>
</tr>
</tbody>
</table>
<p>Observations/Details:</p>
<ul>
<li><strong>Input File details</strong>:
<ul>
<li>Rows: 65536</li>
<li>Columns: 128</li>
<li>FileSize: ~ 72 MB</li>
</ul>
</li>
<li>Full Read (alternate = false)
<ul>
<li>No significant difference between the options as expected</li>
</ul>
</li>
<li>Alternate Read (alternate = true)
<ul>
<li>We get a significant boost in performance 5.8s without optimization to 1.5s with optimization giving us a time
reduction of ~ 75 %</li>
<li>This also gives us a cost saving as 64 GET one for each column per stripe have been replaced with a single GET</li>
<li>We can see a marginal improvement ~ 3% when choosing to retain extra bytes (extraByteTolerance=10.0) as compared to
(extraByteTolerance=0.0) which performs additional work of dropping the extra bytes from memory.</li>
</ul>
</li>
</ul>
<h3 id="summary-">Summary <a id="Summary"></a></h3>
<p>Based on the benchmarks the following is recommended for ORC in AWS S3:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size</code> is set to <code class="language-plaintext highlighter-rouge">4194304</code> (4 MB)</li>
<li><code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size.tolerance</code> is set to value that is acceptable based on the memory usage constraints. When set
to <code class="language-plaintext highlighter-rouge">0.0</code> it will always do the extra work of dropping the extra bytes.</li>
</ul>
</article>
</div>
<div class="clear"></div>
</div>
</section>
<footer role="contentinfo">
<p style="margin-left: 20px; margin-right; 20px; text-align: center">The contents of this website are &copy;&nbsp;2025
<a href="https://www.apache.org/">Apache Software Foundation</a>
under the terms of the <a
href="https://www.apache.org/licenses/LICENSE-2.0.html">
Apache&nbsp;License&nbsp;v2</a>. Apache ORC and its logo are trademarks
of the Apache Software Foundation.</p>
</footer>
<script>
var anchorForId = function (id) {
var anchor = document.createElement("a");
anchor.className = "header-link";
anchor.href = "#" + id;
anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>";
anchor.title = "Permalink";
return anchor;
};
var linkifyAnchors = function (level, containingElement) {
var headers = containingElement.getElementsByTagName("h" + level);
for (var h = 0; h < headers.length; h++) {
var header = headers[h];
if (typeof header.id !== "undefined" && header.id !== "") {
header.appendChild(anchorForId(header.id));
}
}
};
document.onreadystatechange = function () {
if (this.readyState === "complete") {
var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0];
if (!contentBlock) {
return;
}
for (var level = 1; level <= 6; level++) {
linkifyAnchors(level, contentBlock);
}
}
};
</script>
</body>
</html>