Protobuf should be serialized to direct memory instead of heap memory (#2441)

Descriptions of the changes in this PR:

### Motivation

Protobuf serialization is the last step of the netty pipeline. Currently, we allocate a heap buffer while serializing
and pass it down to netty library. In `AbstractChannel#filterOutboundMessage()`, netty copies that data to a direct buffer if
it is currently in heap (otherwise skips it and uses it directly).
This change helps in reducing unncessary CPU cycles for buffer copies and also helps alleviate pressure off the GC, since there is
less churn of memory. In some stress tests, I see a reduction upto 33% in number of GC's with the same avg running duration in BK Client.
The workload does writes of different sizes, ranging from 1 KB to 512 KB.
Bookies aren't usually CPU bound. This change improves READ_ENTRY code paths by a small factor.


### Changes

Used direct memory for buffer allocation versus heap memory


* Added comments
diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java
index 0c237c4..50fdd2f 100644
--- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java
+++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java
@@ -356,10 +356,17 @@
 
     private static ByteBuf serializeProtobuf(MessageLite msg, ByteBufAllocator allocator) {
         int size = msg.getSerializedSize();
-        ByteBuf buf = allocator.heapBuffer(size, size);
+        // Protobuf serialization is the last step of the netty pipeline. We used to allocate
+        // a heap buffer while serializing and pass it down to netty library.
+        // In AbstractChannel#filterOutboundMessage(), netty copies that data to a direct buffer if
+        // it is currently in heap (otherwise skips it and uses it directly).
+        // Allocating a direct buffer reducing unncessary CPU cycles for buffer copies in BK client
+        // and also helps alleviate pressure off the GC, since there is less memory churn.
+        // Bookies aren't usually CPU bound. This change improves READ_ENTRY code paths by a small factor as well.
+        ByteBuf buf = allocator.directBuffer(size, size);
 
         try {
-            msg.writeTo(CodedOutputStream.newInstance(buf.array(), buf.arrayOffset() + buf.writerIndex(), size));
+            msg.writeTo(CodedOutputStream.newInstance(buf.nioBuffer(buf.readerIndex(), size)));
         } catch (IOException e) {
             // This is in-memory serialization, should not fail
             throw new RuntimeException(e);