[rpc] introduce rpc_listened_socket_rx_queue_size metric

This patch introduces a new 'rpc_listened_socket_rx_queue_size'
histogram metric for AcceptorPool.  The metric allows for tracking
the size of the listened RPC socket's RX queue.  The new metric
shows meaningful numbers only on Linux since it's based on the
DiagnosticSocket, where the latter is implemented only on Linux
as of now.

The new metric is sampled by each acceptor thread when accepting
an RPC connection.  It's possible to change the frequency of the
sampling (completely disabling it, if necessary) by tuning the
--rpc_listen_socket_stats_every_log2 flag.

I added basic tests scenarios to cover the newly introduced
functionality.

In addition, an extra performance test scenario has been added into
rpc-bench.  The new scenario is to measure an extra latency introduced
by capturing diagnostic snapshots on the listening RPC socket. Some
results are below (3 measurement at each setting), and based on these
I think that setting --rpc_listen_socket_stats_every_log2=3 by default
makes sense: this about 1.2us of latency in average per RPC request.
The numbers are in microseconds, so the overall latency of handling
RPC requests and the sustainable RPC rate do not seem to be adversely
affected by this patch.  If anybody finds otherwise, they can always
set --rpc_listen_socket_stats_every_log2=-1 for their Kudu cluster
if they don't care about the listening socket's backlog stats.

The results below have been captured when running the command below
for N in the set of { -1, 0, 3, 5 }:
  ./rpc-bench --gtest_filter='*RpcAcceptorBench*' \
      --client_threads=2 \
      --rpc_listen_socket_stats_every_log2=N

  -----------------------------------------------------------------------

  collecting diagnostics on the listening RPC socket ... is disabled
  Dispatched 99851 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 48 average 2.4426495478262611

  Dispatched 98651 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 54 average 2.4904663916229941

  Dispatched 99200 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 53 average 2.4747076612903225

  -----------------------------------------------------------------------

  collecting diagnostics on the listening RPC socket ... every 1 connection(s)
  Dispatched 65383 connection requests in 1 seconds
  Request dispatching time (us): min 6 max 208 average 11.071424201639545

  Dispatched 65162 connection requests in 1 seconds
  Request dispatching time (us): min 6 max 392 average 11.258256092320913

  Dispatched 65428 connection requests in 1 seconds
  Request dispatching time (us): min 6 max 290 average 11.209445208619899

  -----------------------------------------------------------------------

  collecting diagnostics on the listening RPC socket ... every 8 connection(s)
  Dispatched 99902 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 148 average 3.628295729815219

  Dispatched 98139 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 101 average 3.6546429044518489

  Dispatched 101681 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 98 average 3.549552030369489

  -----------------------------------------------------------------------

  collecting diagnostics on the listening RPC socket ... every 32 connection(s)
  Dispatched 100832 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 114 average 2.727457553157727

  Dispatched 100214 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 71 average 2.8103658171512964

  Dispatched 99106 connection requests in 1 seconds
  Request dispatching time (us): min 0 max 52 average 2.7968841442495913

Change-Id: I83580659bac39d9171f1ee0d0e88676ed0d50b99
Reviewed-on: http://gerrit.cloudera.org:8080/20908
Tested-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Yingchun Lai <laiyingchun@apache.org>
11 files changed