| --- |
| layout: post |
| title: 'Offheaping the Read Path in Apache HBase: Part 1 of 2' |
| date: '2015-12-17T00:00:00+00:00' |
| categories: hbase |
| --- |
| <p> <em style="line-height: 15.3045px; widows: 1;">by HBase Committers Anoop Sam John, <span style="line-height: normal;">Ramkrishna S Vasudevan, and </span><span style="line-height: normal;">Michael Stack</span></em></p>
|
| <p><i>Part one of a <a href="https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1">two part</a> blog.</i></p>
|
| <p> </p>
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Introduction</span><span style="font-family: verdana, arial, 'Bitstream Vera Sans', helvetica, sans-serif; font-size: 13px; font-weight: normal; line-height: normal;"></span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">By caching more data in memory, the read latency (and throughput) can be greatly improved. Apache HBase has two layers of data caching. There is what we call “L1” caching, our first caching tier – which caches data in an on heap LRU cache -- and then there is an optional “ L2” second cache tier (aka </span><a href="https://issues.apache.org/jira/browse/HBASE-7404"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">BucketCache</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">), which can be configured to be on heap or off heap. </span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">BucketCache can be configured to operate in one of three different modes: </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">heap</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">, </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">offheap</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">, and </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">file</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">. Regardless of operating mode, the BucketCache manages areas of memory called “buckets” in which it holds cached HBase </span><a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/HFile.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">HFile</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> blocks. Each bucket is created with a target block size. The </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">heap</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">-based implementation creates these buckets on the JVM heap; the </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">offheap</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> implementation uses DirectByteByffers to host buckets outside of the JVM heap; </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">file </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">mode expects a path to a file on the filesystem wherein the buckets are created. The </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">file</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> mode is intended for use with a low-latency backing store – an in-memory filesystem, or perhaps a file sitting on SSD storage. It uses frequency of block access to inform utilization, just like HBase’s default </span><a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/LruBlockCache.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">LruBlockCache</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">, and has its same single-access, multi-access, and in-memory breakdown of 25%, 50%, 25%. Also like the default cache, block eviction is managed using an </span><a href="https://en.wikipedia.org/wiki/Cache_algorithms#LRU"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">LRU</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> algorithm.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The on heap cache is limited by the maximum heap memory for the java process. If we give too much heap memory over to cache, we may starve other users of memory bringing on more GC. At extremes we could provoke Full GC pauses and so on up to </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">OutOfMemoryErrors</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> causing RegionServer restarts and attendant cluster turbulence. Keeping the bulk of the cache off heap takes pressure off the GC.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">As stated above, we already have the “L2” BucketCache which can be configured to run off heap. But one downside of the original implementation was that when we read rows from the BucketCache, we had to copy the block which contained the row(s) onto the on heap area from the off heap buckets. This was due to the assumption, riddled throughout the code base, that the fundamental KeyValue HBase Type, was always backed by an on heap byte array. An HFile block has one or more KeyValues serialized into it. The default size for an HFile block is 64 KB (configurable, of course). An HFile block is what HBase reads from HDFS on each seek to pull in data into cache regardless of what the actual KeyValue size is (if a KeyValue is > 64 KB in size, that becomes the size of the HFile block we fish out of HDFS). 64 KB then is the increment in which we cache. So reading even one KeyValue from off heap, required the allocation of 64 KBs of contiguous on heap memory and then a copy of the bytes from off heap to on heap. This constant churn of bringing the off heap blocks on heap generated lots of garbage and in turn GC (See the GC spiral already noted above).</span></p>
|
| <p><a href="https://issues.apache.org/jira/browse/HBASE-11425" style="widows: 1;"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">HBASE-11425 Cell/DBB end-to-end on the read-path</span></a><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> changed HBase core so we can serve requests directly from off heap memory. A prepatory project converted much of HBase core to work with a new </span><span style="widows: 1; font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Cell </span><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Interface in place of our fundamental KeyValue type. The Cell Interface has API to refer to all the particles of the old KeyValue Type -- the row key, the column family, the column qualifier, etc. -- but the implementation goes unspecified. As part of HBASE-11425, an implementation of Cell was made that could reference off heap memory and then this instance was plumbed-in throughout the HBase read path. The result was substantial savings in heap allocations and data copying as well as improved read performance. Below is detail on the changes made in HBase core. <a href="https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1">Part two</a> of this blog has some comparisons that show how this work has improved HBase read.</span></p><span id="docs-internal-guid-c4444b21-b206-a773-5aef-4c33b0c8eea4" style="line-height: normal; widows: 1;"><span id="docs-internal-guid-c4444b21-b20a-57f9-7f3b-be3371ceea9e">
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><br class="Apple-interchange-newline" />Selection of data structure for off heap storage</span></h2>
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">HBASE-11425 is about serving data directly from off heap buckets out of BucketCache without the need for copying. During reads, we parse individual Cell components multiple times decoding lengths and content of various particles (key, value, tag). Cells are frequently compared for proper sorting and ordering so it is important that the overhead added by the new refactor is minimal and that the cost is similar to what the current byte array backed Cell gives us.</span></p>
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Right now the BucketCache has its buckets in the form of NIO </span><a href="http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBuffers</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">. NIO buffer reads are known to be slower because of buffer boundary checks and so on. Netty’s </span><a href="http://netty.io/4.0/api/io/netty/buffer/ByteBuf.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBuf</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> is another buffering option. It has on heap as well as off heap flavors, just like NIO. Below are some </span><a href="http://openjdk.java.net/projects/code-tools/jmh/"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">JMH</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> micro benchmark tests comparing Netty ByteBuf and NIO.</span></p>
|
| <ul style="margin-top: 0pt; margin-bottom: 0pt;">
|
| <li dir="ltr" style="list-style-type: disc; vertical-align: baseline; background-color: transparent;">
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 0pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Take an NIO DirectByteBuffer (referred to as DBB in this document) and a Netty DirectByteBuf</span></p>
|
| </li>
|
| <li dir="ltr" style="list-style-type: disc; vertical-align: baseline; background-color: transparent;">
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 0pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Write a long, int and short values to both of these data structures.</span></p>
|
| </li>
|
| <li dir="ltr" style="list-style-type: disc; vertical-align: baseline; background-color: transparent;">
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">JMH benchmark tests read back these long, int and short values.</span></p>
|
| </li>
|
| </ul>
|
| <p dir="ltr" style="line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Our test found that the reads from DBB are far better than Netty’s Direct ByteBuf. JMH test results on jdk 1.8.45 are shown below.</span></p>
|
| <pre style="line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">@State(Scope.Thread)
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">public class NettyVsNio {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">static {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> io.netty.util.ResourceLeakDetector.setLevel(Level.DISABLED);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">private ByteBuffer nioDBB = ByteBuffer.allocateDirect(14);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">private ByteBuf nettyDBB = UnpooledByteBufAllocator.DEFAULT.directBuffer(14);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">private Blackhole blackhole = new Blackhole();
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">@Setup
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">public void setup() {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nioDBB.putLong(1234L);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nioDBB.putInt(100);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nioDBB.putShort((short) 75);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nettyDBB.writeLong(1234L);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nettyDBB.writeInt(100);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">nettyDBB.writeShort(75);
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">@Benchmark
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">public void nioOffheap() {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nioDBB.getLong(0));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nioDBB.getInt(8));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nioDBB.getShort(12));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">@Benchmark
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">public void nettyOffheap() {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nettyDBB.getLong(0));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nettyDBB.getInt(8));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">blackhole.consume(nettyDBB.getShort(12));
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">public static void main(String[] args) throws RunnerException {
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Options opt = new OptionsBuilder()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">.include(NettyVsNio.class.getSimpleName()).warmupIterations(2)
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">.measurementIterations(4).forks(1).build();
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">new Runner(opt).run();
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">}</span>
|
| <span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "nettyOffheap":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 57366360.944 ±(99.9%) 11533933.769 ops/s [Average]
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "nioOffheap":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 60089837.738 ±(99.9%) 14171768.229 ops/s [Average]</span></font></pre></span></span><span style="line-height: normal; widows: 1;"></span>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The performance of the reads can be improved by using Unsafe based reads, bypassing the DBB boundary check paths (See </span><a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Bytes</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> where we already do such optimizations for byte array reads and writes). We have done similar tests with Unsafe based optimizations on NIO DBB and Netty ByteBuf. The performance, as expected, is almost identical as both follow the same code path as Unsafe API reads. NIO DBB has the edge here also.</span></pre>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "nettyOffheap":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 83613659.416 ±(99.9%) 535211.991 ops/s [Average]
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "nioOffheap":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 84514777.734 ±(99.9%) 1199369.976 ops/s [Average]</span></font></pre>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Based on the above tests, we decided to stick with the existing NIO DBB based buckets in BucketCache.</span></p><br style="line-height: normal; widows: 1;" />
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Cell extension on the server side</span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">As discussed above, when we directly serve Cells from off heap memory, we will need a new set of APIs for retrieving Cell components (such as row key, column family, etc.) that are off heap. When the Cell getXXXArray APIs -- i.e. getRowArray, getFamilyArray, etc., see the </span><a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/Cell.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Cell API</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> -- are used, they return a byte array (byte []). If data is off heap, we would need to copy the Cell component bytes to a temporary on heap byte array whenever any getXXXArray method is called. Such an on heap allocation and copy would undo any benefit of our offheaping effort. Since our data is off heap in NIO ByteBuffers, we need APIs to return NIO ByteBuffers added to the Cell Interface. This way we can avoid copying Cell component bytes on heap and instead directly return references to the off heap data. Corresponding to each of the getXXXArray() APIs we added an equivalent ByteBuffer backed API. </span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The new buffer based APIs required are</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt; margin-left: 18pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">getRowByteBuffer()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">getFamilyByteBuffer()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">getQualifierByteBuffer()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">getValueByteBuffer()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">getTagsByteBuffer()</span></font></pre>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The off heap backed Cells are used server side only. We don’t want to add the extra APIs to the public Cell interface used heavily by existing HBase clients. Instead we extended the interface for Cell with </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBufferedCell</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> and added these new methods here on this private server-side Type.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Along with all getXXXArray() methods, we have getXXXOffset() and getXXXLength() APIs in Cell so that one can refer to each component’s particular byte range in the backing byte array. In Java’s ByteBuffer, the ByteBuffer object itself can carry the offset, position and length info. But we decided against using ByteBuffers’ methods. We would have had to duplicate the backing ByteBuffer on each method invocation and then set each Cell particle offset and length into the new duplicated ByteBuffer. This duplication (extra object creation) would cause a lot of short-lived Java object creation. Also setting position and limit on ByteBuffer involves ByteBuffer limit/capacity checks.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Instead, we added a new set of getter APIs for the offset to be used with the returned ByteBuffers. These are</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt; margin-left: 18pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">int getRowPosition()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">int getFamilyPosition()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">int getQualifierPosition()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">int getValuePosition()
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">int getTagsPosition()</span></font></pre>
|
| <p dir="ltr" style="line-height: normal; widows: 1; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; line-height: 11.7727px; background-color: transparent;">When </span><span style="line-height: 1.295; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">any code uses getXXXByteBuffer(), it has to use the corresponding getXXXPosition() and the existing getXXXLength to determine the range of component bytes referenced by the returned ByteBuffer. Below is the new ByteBufferCell class in its entirety:</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1">public abstract class ByteBufferedCell implements Cell {
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract ByteBuffer getRowByteBuffer();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract int getRowPosition();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract ByteBuffer getFamilyByteBuffer();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract int getFamilyPosition ();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract ByteBuffer getQualifierByteBuffer();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract int getQualifierPosition ();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract ByteBuffer getValueByteBuffer();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract int getValuePosition ();
|
| </font></span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"><font face="courier new, courier, monospace" size="1"> abstract ByteBuffer getTagsByteBuffer();
|
| </font></span><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> abstract int getTagsPosition (); </span><span style="white-space: pre-wrap; line-height: 1.295; background-color: transparent;">}</span></font></pre>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">It is a little awkward that the Cell interface getXXXOffset() method and the ByteBufferedCell method getXXXPosition() are similar in intent -- getting the data offset or current position in the buffer --- but one is for use when the Cell is byte array backed and the latter is for when we are returning ByteBuffers. We could have let getXXXOffset() work against the returned ByteBuffer but then we would again need to do slice()/duplicate() on the underlying ByteBuffer for every returned Cell particle ByteBuffer. This again would result in creating a lot of short lived objects. We therefore opted to add a method per return type.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The other thing to note is that the implementation of </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBufferedCell</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> does not throw an exception when getXXXArray() is used with getXXXOffset() though the Cell might be backed by off heap data: i.e. ByteBuffers. In this case, if invoked, internally we copy the contents of the ByteBuffer to an on heap byte[] and use zero as the returned offset. We did this so we did not have to convert every single location in the HBase code base; tests and example code will continue to work; they will just be making an expensive copy under the covers. This seemed the least obnoxious alternative; any code paths that are doing getXXXArray when they should be doing getXXXByteBuffer will show hot in the profiler and will get converted.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Also note that the ByteBufferedCell extension to the Cell interface is done as an abstract class not as an interface. We found the former form to be more performant. As discussed above, we have to use buffer backed APIs instead of array backed when the Cell is an off heap one. The off heap Cell implementation will be an instance of the new ByteBufferedCell extension. So we will need instance checks in a key locations to decide which path to use processing reads. CellComparators, etc., would need to perform many such instance checks as Cells bubble-up through the server. A JMH class tested CellComparator#compare(Cell, Cell) when both sides are array backed and then both are ByteBuffer backed. The compare logic is as below.</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1">compareRows (Cell c1, Cell c2) {
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> if(c1 instance of ByteBuffredCell && c2 instance of ByteBuffredCell){
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> // return compare based on BB method on both
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> }
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> if(c1 instance of ByteBuffredCell){
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> // return based on BB based method on c1 and array based on c2
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> }
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> if(c2 instance of ByteBuffredCell){
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> // return based on BB based method on c2 and array based on c1
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> }
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1"> // return compare based on array method on both
|
| </font></span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap;"><font face="courier new, courier, monospace" size="1">}</font></span></pre>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">We ran the test passing different combinations of Cell types.</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap;"><strong>Benchmark </strong></span><span style="vertical-align: baseline; white-space: pre-wrap;"><strong>Mode Cnt Score Error Units</strong>
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap;">InstanceOfCost.withoutInstanceOf </span><span style="vertical-align: baseline; white-space: pre-wrap;">thrpt 12 42560522.072 ± 1921505.003 ops/s
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap;">InstanceOfCost.withInstanceOfBothBBCells </span><span style="vertical-align: baseline; white-space: pre-wrap;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap;"> thrpt 12 41832839.485 ± 1945841.856 ops/s
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap;">InstanceOfCost.withInstanceOfBothCells </span><span style="vertical-align: baseline; white-space: pre-wrap;"><span class="Apple-tab-span" style="white-space: pre;"> </span></span><span style="vertical-align: baseline; white-space: pre-wrap;"> thrpt 12 15500488.062 ± 220373.805 ops/s</span></font></pre>
|
| <p dir="ltr" style="line-height: normal; widows: 1; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap;">"wit<span style="line-height: 1.295;">houtInstanceOf” is the old case where compare logic doesn’t have any instance check at all. In case #2 and #3, there are instance checks as above and we pass both ByteBuffer backed cells and array backed cells respectively. We can see significant throughput decrease for case #3.</span></span></p>
|
| <p><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Later, the same test was repeated with Cell extension as an Abstract class rather than as an Interface. Here for any combination of Cell types, we get similar throughput which is almost same as for the old case (ie. </span><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap;">withoutInstanceOf). The difference was because of Java L2 compiler inlining which was not happening when the Interface based Cell extension was in place.</span></p>
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Comparators</span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">We have purged the old KeyValue-based KVComparator and its child classes (Refer to HBASE-10800) The compare operations in these classes assumed a KeyValue Type layout for keys (e.g. “ROW:COLUMN_FAMILY:COLUMN_QUALIFIER…” etc.). Instead we have introduced CellComparator. Internally it compares using getXXXArray() or getXXXByteBuffer() API as appropriate when comparing</span></p>
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Unsafe based ByteBuffer reads</span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The performance of the reads can be improved further. We have provided a </span><a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html"><span style="text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Java Unsafe</span></a><span style="font-weight: 700; font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">way of manipulating the APIs so that we get better performance by avoiding range checks, etc., done internally by ByteBuffers. This technique is not new in HBase because we already have Unsafe support for byte array manipulations. Comparisons dones with ByteBuffers use the Unsafe based compares which makes them faster.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">We have also done a JMH micro benchmark tests for comparing the throughput of byte array comparison vs ByteBuffer comparison.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Two byte arrays are created with 135 bytes each and both arrays have the same bytes. We compare these two byte arrays using </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">org.apache.hadoop.hbase.util.Bytes#compareTo</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> which does an Unsafe based compare.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Two direct byte buffers (off heap) are created with a capacity of 135 bytes and both have the same content. The compare of these two byte buffers uses a newly created method, </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">org.apache.hadoop.hbase.util.ByteBufferUtils#compare</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">. We use a similar Unsafe based optimization here also.</span></p>
|
| <pre style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><font face="courier new, courier, monospace" size="1"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Results in Java 8
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "offheapCompare":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 38205893.545 ±(99.9%) 265309.769 ops/s [Average]
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> (min, avg, max) = (38164316.169, 38205893.545, 38246801.754), stdev = 41056.980
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> CI (99.9%): [37940583.776, 38471203.313] (assumes normal distribution)
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);">Result "onheapCompare":
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> 31166847.740 ±(99.9%) 430242.970 ops/s [Average]
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> (min, avg, max) = (31069672.467, 31166847.740, 31214107.795), stdev = 66580.576
|
| </span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: rgb(231, 230, 230);"> CI (99.9%): [30736604.770, 31597090.710] (assumes normal distribution)</span></font></pre>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">So we can see from the micro benchmarks results that Unsafe based comparisons can be performed on both on heap and off heap ByteBuffers and they perform better than normal non-Unsafe based comparisons.</span></p>
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBuff wrapper Interface over NIO buffers</span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">When an HFile block is available in the BucketCache, its data might be split across more than one bucket. Each bucket has a ByteBuffer of various sizes starting at 4 MB. By default our HFile block size is well below this (ie. 64 KB). We can see that there will be more than one block served by a single bucket buffer. There can be some blocks with data beginning at the edge of a bucket that span into the next bucket also. We want to avoid any sort of copy when reading blocks out of BucketCache and want a single data structure that can span multiple NIO buffers. Since NIO ByteBuffers cannot be sub-classed we could not use an extension to wrap multiple ByteBuffers. So we have added a new Interface (With a subset of the APIs from ByteBuffer) named </span><a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/nio/ByteBuff.html"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">ByteBuff</span></a><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">. We have 2 implementations for it. When the backing is by a single ByteBuffer (most of the cases from BucketCache and also in cases such as the on heap LRU L1 cache), we make a </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">SingleByteBuff</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> instance. This just delegates calls to the wrapped NIO ByteBuffer. We also have a </span><span style="font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">MultiByteBuff</span><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> when the data spans across more than one ByteBuffer. The HFileBlock’s data structure type has been changed to be a ByteBuff.</span></p><br style="line-height: normal; widows: 1;" />
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"><span style="font-weight: 400; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">Prevention of Block Eviction from BucketCache when in use</span></h2>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">BucketCache evicts blocks and frees the associated buckets when it runs out of space for caching new blocks. Before HBASE-11425, any block could be evicted anytime as we copied the buckets’ content to a new ByteBuffer when a block was read from the BlockCache. Since HBASE-11425, we go out of our way to avoid this copy on to the heap of the BlockCache content. The data is instead served directly from the BucketCache’s buckets. So we have to prevent the eviction of blocks that are being actively read. We use a simple reference counting mechanism. The reference count of a block is incremented whenever it is read from the Cache (Using the getBlock API). We have added a new API to BlockCache to return a block back to Cache after the read on that block is over. The reference count is decremented then. The block can be evicted iff its reference count is zero.</span></p>
|
| <p dir="ltr" style="widows: 1; line-height: 1.295; margin-top: 0pt; margin-bottom: 8pt;"><span style="vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The reads that we perform refer to Cells that are serialized in these buckets. So as long as a scanner is still running, these blocks cannot be evicted. Eviction would mean that the Cells that are referred to by the reads could be corrupted as eviction would have changed the backing blockbytes. </span></p>
|
| <p><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">The reference to these Cells in the buckets will be removed only after the Cells have been passed back to the Client. In the RPC tier, a </span><span style="widows: 1; font-style: italic; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;">CellBlock</span><span style="widows: 1; vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"> is created in the RpcServer where the cached Cells that help make up client results are copied -- encoded and compressed as appropriate -- from the referenced BucketCache buckets before being passed back to the client (as protobufs). At this point, all the references to the buckets are considered freed. Reference counting also means we delay the close of the scanners until the RPC layer is done writing out the read content.</span></p>
|
| <p>In <a href="https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1">part two</a>, we detail comparison of on and off heap read paths.</p>
|
| <h2 dir="ltr" style="margin-top: 2pt; margin-bottom: 0pt; widows: 1; line-height: 1.295;"> </h2>
|
| <p> </p> |