| <!DOCTYPE HTML> |
| <html lang="en-US"> |
| <head> |
| <meta charset="UTF-8"> |
| <title>IO</title> |
| <meta name="viewport" content="width=device-width,initial-scale=1"> |
| <meta name="generator" content="Jekyll v4.3.4"> |
| <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> |
| <link rel="stylesheet" href="/css/screen.css"> |
| <link rel="icon" type="image/x-icon" href="/favicon.ico"> |
| <!--[if lt IE 9]> |
| <script src="/js/html5shiv.min.js"></script> |
| <script src="/js/respond.min.js"></script> |
| <![endif]--> |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| _paq.push(["setDoNotTrack", true]); |
| _paq.push(["disableCookies"]); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '68']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| </head> |
| |
| |
| <body class="wrap"> |
| <header role="banner"> |
| <nav class="mobile-nav show-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/releases/"><span class="show-on-mobiles">Rel</span> |
| <span class="hide-on-mobiles">Releases</span></a> |
| </li> |
| <li class=""> |
| <a href="/docs/"><span class="show-on-mobiles">Doc</span> |
| <span class="hide-on-mobiles">Documentation</span></a> |
| </li> |
| <li class=""> |
| <a href="/talks/"><span class="show-on-mobiles">Talk</span> |
| <span class="hide-on-mobiles">Talks</span></a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/develop/"><span class="show-on-mobiles">Dev</span> |
| <span class="hide-on-mobiles">Develop</span></a> |
| </li> |
| <li class=""> |
| <a href="/help/">Help</a> |
| </li> |
| </ul> |
| |
| </nav> |
| <div class="grid"> |
| <div class="unit one-quarter center-on-mobiles"> |
| <h1> |
| <a href="/"> |
| <span class="sr-only">Apache ORC</span> |
| <img src="/img/logo.png" width="249" height="101" alt="ORC Logo"> |
| </a> |
| </h1> |
| </div> |
| <nav class="main-nav unit three-quarters hide-on-mobiles"> |
| <ul> |
| <li class=""> |
| <a href="/">Home</a> |
| </li> |
| <li class=""> |
| <a href="/releases/"><span class="show-on-mobiles">Rel</span> |
| <span class="hide-on-mobiles">Releases</span></a> |
| </li> |
| <li class=""> |
| <a href="/docs/"><span class="show-on-mobiles">Doc</span> |
| <span class="hide-on-mobiles">Documentation</span></a> |
| </li> |
| <li class=""> |
| <a href="/talks/"><span class="show-on-mobiles">Talk</span> |
| <span class="hide-on-mobiles">Talks</span></a> |
| </li> |
| <li class=""> |
| <a href="/news/">News</a> |
| </li> |
| <li class="current"> |
| <a href="/develop/"><span class="show-on-mobiles">Dev</span> |
| <span class="hide-on-mobiles">Develop</span></a> |
| </li> |
| <li class=""> |
| <a href="/help/">Help</a> |
| </li> |
| </ul> |
| |
| </nav> |
| </div> |
| </header> |
| |
| |
| <section class="standalone"> |
| <div class="grid"> |
| |
| <div class="unit whole"> |
| <article> |
| <h1>IO</h1> |
| <ul> |
| <li><a href="#Background">Background</a> |
| <ul> |
| <li><a href="#SeekvsRead">Seek vs Read</a></li> |
| <li><a href="#ORCRead">ORC Read</a></li> |
| </ul> |
| </li> |
| <li><a href="#ReadOptimization">Read Optimization</a> |
| <ul> |
| <li><a href="#Approach">Approach</a></li> |
| <li><a href="#Scope">Scope</a></li> |
| <li><a href="#Benchmarks">Benchmarks</a> |
| <ul> |
| <li><a href="#LocalFS">Local FS</a></li> |
| <li><a href="#AWSS3">AWS S3</a></li> |
| </ul> |
| </li> |
| <li><a href="#Summary">Summary</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h2 id="background-">Background <a id="Background"></a></h2> |
| |
| <p>We are moving our workloads from HDFS to AWS S3. As part of this activity we wanted to understand the performance |
| characteristics and costs of using S3.</p> |
| |
| <h3 id="seek-vs-read-">Seek vs Read <a id="SeekvsRead"></a></h3> |
| |
| <p>One particular scenario that stood out in our performance testing was Seek vs Read when dealing with S3.</p> |
| |
| <p>In this test we are trying to read through a file</p> |
| |
| <ul> |
| <li>Seek to Point A in the file read X bytes</li> |
| <li>Move to Point B in the file that is A + X + Y |
| <ul> |
| <li>This is accomplished as another seek or as a read</li> |
| <li>We will leave Y variable to determine when this is best</li> |
| </ul> |
| </li> |
| <li>Read X bytes</li> |
| </ul> |
| |
| <p><img src="/img/seekvsread.png" alt="Seek vs Read" /></p> |
| |
| <p>Observations:</p> |
| |
| <ul> |
| <li>We could clearly see that a read is more performant than seek when dealing with steps/gaps smaller than 4 MB. |
| <ul> |
| <li>At 4 MB read is faster by ~ 11%</li> |
| <li>At 1 MB read is faster by ~ 20%</li> |
| </ul> |
| </li> |
| <li>Reads are also cheaper as we perform a single GET instead of multiple GETs from <a href="https://aws.amazon.com/s3/pricing/">AWS S3 Pricing</a> |
| <ul> |
| <li>Cost for GET: $0.0004</li> |
| <li>Cost for Data Retrieval to the same region AWS EKS: $0.0000</li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h3 id="orc-read-">ORC Read <a id="ORCRead"></a></h3> |
| |
| <p>Based on the above performance penalty when dealing with multiple seeks over small gaps, we measured the performance of |
| ORC read on a file.</p> |
| |
| <p>File details:</p> |
| |
| <ul> |
| <li>Size ~ 21 MB</li> |
| <li>Column Count: ~ 400</li> |
| <li>Row Count: ~ 65K</li> |
| </ul> |
| |
| <table> |
| <thead> |
| <tr> |
| <th style="text-align: left">Read Type</th> |
| <th style="text-align: right">Duration</th> |
| <th style="text-align: left">Unit</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td style="text-align: left">All Columns</td> |
| <td style="text-align: right">1.075</td> |
| <td style="text-align: left">s</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">Alternate Columns</td> |
| <td style="text-align: right">6.489</td> |
| <td style="text-align: left">s</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p>Observations:</p> |
| |
| <ul> |
| <li>We can clearly see that we pay a significant penalty when reading alternate columns, which in the current |
| implementation of ORC translates to multiple GET calls on AWS S3</li> |
| <li>While the impact of penalty will be less significant in large reads, it will incur overheads both in terms of time and |
| cost</li> |
| </ul> |
| |
| <h2 id="read-optimization-">Read Optimization <a id="ReadOptimization"></a></h2> |
| |
| <h3 id="approach-">Approach <a id="Approach"></a></h3> |
| |
| <p>The following optimizations are planned:</p> |
| |
| <ul> |
| <li><strong>orc.min.disk.seek.size</strong> is a value in bytes: When trying to determine a single read, if the gap between two reads |
| is smaller than this then it is combined into a single read.</li> |
| <li><strong>orc.min.disk.seek.size.tolerance</strong> is a fractional input: If the extra bytes read is greater than this fraction of |
| the required bytes, then we drop the extra bytes from memory.</li> |
| <li>We can further consider adding an optimization for the complete stripe in case the stripe size is smaller than |
| <code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size</code></li> |
| </ul> |
| |
| <h3 id="scope-">Scope <a id="Scope"></a></h3> |
| |
| <p>Different types of IO takes place in ORC today.</p> |
| |
| <ul> |
| <li>Reading of File Footer: Unchanged</li> |
| <li>Reading of Stripe Footer: Unchanged</li> |
| <li>Reading of Stripe Index information: Optimized</li> |
| <li>Reading of Stripe Data: Optimized</li> |
| </ul> |
| |
| <p>Each of the above happens at different stages of the read. The current implementation optimizes reads that happen using |
| the <a href="https://github.com/apache/orc/tree/main/java/core/src/java/org/apache/orc/DataReader.java">DataReader</a> interface.</p> |
| |
| <p>This does not:</p> |
| |
| <ul> |
| <li>Optimize the read of the file/stripe footer</li> |
| <li>Reads across multiple stripes</li> |
| </ul> |
| |
| <h3 id="benchmarks-">Benchmarks <a id="Benchmarks"></a></h3> |
| |
| <h4 id="local-fs-">Local FS <a id="LocalFS"></a></h4> |
| |
| <p>This benchmark is run on the local filesystem with NVMe SSD, so it has very different performance characteristics to AWS |
| S3.</p> |
| |
| <p>The purpose of this benchmark is to ascertain if we have added any significant penalties in the ORC code by adding |
| <code class="language-plaintext highlighter-rouge">minSeekSize</code> and <code class="language-plaintext highlighter-rouge">extraByteTolerance</code>.</p> |
| |
| <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java <span class="nt">-jar</span> java/bench/core/target/orc-benchmarks-core-<span class="k">*</span><span class="nt">-uber</span>.jar chunk_read |
| </code></pre></div></div> |
| |
| <table> |
| <thead> |
| <tr> |
| <th style="text-align: left">(alt)</th> |
| <th style="text-align: right">(cols)</th> |
| <th style="text-align: right">(byteTol)</th> |
| <th style="text-align: right">(minSeek)</th> |
| <th style="text-align: left">Mode</th> |
| <th style="text-align: right">Cnt</th> |
| <th style="text-align: right">Score</th> |
| <th style="text-align: left">Sign</th> |
| <th style="text-align: right">Error</th> |
| <th style="text-align: left">Units</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td style="text-align: left">true</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">0</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.352</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.006</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">true</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.357</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.002</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">true</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">10.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.349</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.002</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">false</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">0</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.667</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.007</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">false</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.673</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.004</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">false</td> |
| <td style="text-align: right">128</td> |
| <td style="text-align: right">10.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">20</td> |
| <td style="text-align: right">0.671</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.005</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p>Observations/Details:</p> |
| |
| <ul> |
| <li><strong>Input File details</strong>: |
| <ul> |
| <li>Rows: 65536</li> |
| <li>Columns: 128</li> |
| <li>FileSize: ~ 72 MB</li> |
| </ul> |
| </li> |
| <li>Full Read (alternate = false) |
| <ul> |
| <li>No significant difference between the options as expected</li> |
| </ul> |
| </li> |
| <li>Alternate Read (alternate = true) |
| <ul> |
| <li>No significant difference between the options given the small file size and performance of local disk</li> |
| <li>This also calls out that the recommended minSeekSize should be determined for each platform e.g. HDFS, S3, etc</li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h4 id="aws-s3-">AWS S3 <a id="AWSS3"></a></h4> |
| |
| <p>In this benchmark we brought up an EKS Container in the same region as the AWS S3 bucket to test the performance of the |
| patch.</p> |
| |
| <table> |
| <thead> |
| <tr> |
| <th style="text-align: left">(alternate)</th> |
| <th style="text-align: right">(byteTol)</th> |
| <th style="text-align: right">(minSeekSize)</th> |
| <th style="text-align: left">Mode</th> |
| <th style="text-align: right">Cnt</th> |
| <th style="text-align: right">Score</th> |
| <th style="text-align: left">Sign</th> |
| <th style="text-align: right">Error</th> |
| <th style="text-align: left">Units</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td style="text-align: left">FALSE</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">0</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">1.837</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.089</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">FALSE</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">1.919</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.11</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">FALSE</td> |
| <td style="text-align: right">10.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">1.895</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.191</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">TRUE</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">0</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">5.8</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">1.132</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">TRUE</td> |
| <td style="text-align: right">0.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">1.479</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.197</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| <tr> |
| <td style="text-align: left">TRUE</td> |
| <td style="text-align: right">10.0</td> |
| <td style="text-align: right">4194304</td> |
| <td style="text-align: left">avgt</td> |
| <td style="text-align: right">5</td> |
| <td style="text-align: right">1.435</td> |
| <td style="text-align: left">±</td> |
| <td style="text-align: right">0.176</td> |
| <td style="text-align: left">s/op</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p>Observations/Details:</p> |
| |
| <ul> |
| <li><strong>Input File details</strong>: |
| <ul> |
| <li>Rows: 65536</li> |
| <li>Columns: 128</li> |
| <li>FileSize: ~ 72 MB</li> |
| </ul> |
| </li> |
| <li>Full Read (alternate = false) |
| <ul> |
| <li>No significant difference between the options as expected</li> |
| </ul> |
| </li> |
| <li>Alternate Read (alternate = true) |
| <ul> |
| <li>We get a significant boost in performance 5.8s without optimization to 1.5s with optimization giving us a time |
| reduction of ~ 75 %</li> |
| <li>This also gives us a cost saving as 64 GET one for each column per stripe have been replaced with a single GET</li> |
| <li>We can see a marginal improvement ~ 3% when choosing to retain extra bytes (extraByteTolerance=10.0) as compared to |
| (extraByteTolerance=0.0) which performs additional work of dropping the extra bytes from memory.</li> |
| </ul> |
| </li> |
| </ul> |
| |
| <h3 id="summary-">Summary <a id="Summary"></a></h3> |
| |
| <p>Based on the benchmarks the following is recommended for ORC in AWS S3:</p> |
| |
| <ul> |
| <li><code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size</code> is set to <code class="language-plaintext highlighter-rouge">4194304</code> (4 MB)</li> |
| <li><code class="language-plaintext highlighter-rouge">orc.min.disk.seek.size.tolerance</code> is set to value that is acceptable based on the memory usage constraints. When set |
| to <code class="language-plaintext highlighter-rouge">0.0</code> it will always do the extra work of dropping the extra bytes.</li> |
| </ul> |
| |
| |
| </article> |
| </div> |
| |
| <div class="clear"></div> |
| |
| </div> |
| </section> |
| |
| |
| <footer role="contentinfo"> |
| <p style="margin-left: 20px; margin-right; 20px; text-align: center">The contents of this website are © 2025 |
| <a href="https://www.apache.org/">Apache Software Foundation</a> |
| under the terms of the <a |
| href="https://www.apache.org/licenses/LICENSE-2.0.html"> |
| Apache License v2</a>. Apache ORC and its logo are trademarks |
| of the Apache Software Foundation.</p> |
| </footer> |
| |
| <script> |
| var anchorForId = function (id) { |
| var anchor = document.createElement("a"); |
| anchor.className = "header-link"; |
| anchor.href = "#" + id; |
| anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa fa-link\"></i>"; |
| anchor.title = "Permalink"; |
| return anchor; |
| }; |
| |
| var linkifyAnchors = function (level, containingElement) { |
| var headers = containingElement.getElementsByTagName("h" + level); |
| for (var h = 0; h < headers.length; h++) { |
| var header = headers[h]; |
| |
| if (typeof header.id !== "undefined" && header.id !== "") { |
| header.appendChild(anchorForId(header.id)); |
| } |
| } |
| }; |
| |
| document.onreadystatechange = function () { |
| if (this.readyState === "complete") { |
| var contentBlock = document.getElementsByClassName("docs")[0] || document.getElementsByClassName("news")[0]; |
| if (!contentBlock) { |
| return; |
| } |
| for (var level = 1; level <= 6; level++) { |
| linkifyAnchors(level, contentBlock); |
| } |
| } |
| }; |
| </script> |
| |
| |
| </body> |
| </html> |