KUDU-2353: tool to parse metrics out of diagnostic logs

This patch contains C++ implementation of the metrics log parser script.
There are a couple functional differences between this tool and the
existing script:
  - This tool recognizes table metrics
  - This tool allows for filtering metrics by table or tablet
    identifier
  - Histogram metrics for this tool also spit out the rough count of the
    measurements

This patch also addresses KUDU-2597 as a subtask of the JIRA item
mentioned in the summary.

NOTES:
  - Kudu metrics are only output into the diagnostic log when a metric
    has changed, so this patch tracks the metric values per entity
    (tablet ID, table ID, server) at each point in time in order
    to output the correct values. This means that if within a given set
    of files, a tablet's metric has not changed and no corresponding
    records are in the files, this tool is not printing any information
    on the tablet's metrics.

  - Kudu histogram metrics do spit out a summary for percentiles. The
    tool explicitly does not use that and instead generates these
    metrics from the histogram counts. While less accurate (IIUC, the
    counts can be lossy), this allows us to generate aggregated
    summaries from multiple entities.

Here's an example:

[awong@va1022 release]$ ./bin/kudu diagnose parse_metrics kudu-tserver.worker12.foobar.com.kudu.diagnostics.20210123-201217.0.74565 --simple_metrics=tablet.scans_started:num_scans_started --rate_metrics=tablet.scans_started:scans_started_per_sec --histogram_metrics=server.scanner_duration:scanner_duration_
us,server.handler_latency_kudu_tserver_TabletServerService_Scan:scan_rpc_us
I0131 11:53:27.010298 151768 diagnostics_log_parser.cc:272] collecting simple metric tablet.scans_started as num_scanners_started
I0131 11:53:27.010438 151768 diagnostics_log_parser.cc:279] collecting rate metric tablet.scans_started as scanners_started_per_sec
I0131 11:53:27.010455 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.handler_latency_kudu_tserver_TabletServerService_Scan as scan_rpc_us
I0131 11:53:27.010524 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.scanner_duration as scanner_duration_us
timestamp       num_scanners_started       scanners_started_per_sec   scan_rpc_us_count       scan_rpc_us_min scan_rpc_us_p50 scan_rpc_us_p75 scan_rpc_us_p95 scan_rpc_us_p99 scan_rpc_us_p99_99      scan_rpc_us_max scanner_duration_us_count       scanner_duration_us_min scanner_duration_us_p50 scanner_duration_us_p75 scanner_duration_us_p95 scanner_duration_us_p99 scanner_duration_us_p99_99  scanner_duration_us_max
1611432793767488        68492   0       434170147       2       1215    1639    3711    12927   501759  8650751 1854125 12      23295   1302527 54788095        60030975        60030975        60030975
1611432853767552        231516  2717.0637684653134      434198546       2       1215    1639    3711    12927   501759  8650751 1854200 12      23295   1302527 54788095        60030975        60030975        60030975
1611432913767616        349073  1959.2812434333403      434227285       2       1215    1639    3711    12927   501759  8650751 1854306 12      23295   1302527 54788095        60030975        60030975        60030975
1611432973767689        829597  8008.7235893863 434255021       2       1215    1639    3711    12927   501759  8650751 1854517 12      23295   1302527 54788095        60030975        60030975        60030975
1611433033767772        926516  1615.314432148369       434283184       2       1215    1639    3711    12927   501759  8650751 1854605 12      23295   1302527 54788095        60030975        60030975        60030975
1611433093767841        926626  1.8333312250024245      434309627       2       1215    1639    3711    12927   501759  8650751 1854719 12      23295   1302527 54788095        60030975        60030975        60030975
1611433153767902        960053  557.11610026529809      434339928       2       1215    1639    3711    12927   501759  8650751 1854788 12      23295   1302527 54788095        60030975        60030975        60030975
1611433213767967        1009625 826.19910495096963      434366776       2       1215    1639    3711    12927   501759  8650751 1854831 12      23295   1302527 54788095        60030975        60030975        60030975
1611433273768032        1059960 838.91575784126235      434394555       2       1215    1639    3711    12927   501759  8650751 1854966 12      23295   1302527 54788095        60030975        60030975        60030975
1611433333768067        1061577 26.949984279175837      434420683       2       1215    1639    3711    12927   501759  8650751 1855023 12      23295   1302527 54788095        60030975        60030975        60030975
1611433393768130        1082096 341.98297425121041      434447991       2       1215    1639    3711    12927   501759  8650751 1855185 12      23295   1302527 54788095        60030975        60030975        60030975
1611433453768205        1083102 16.76664570835953       434476348       2       1215    1639    3711    12927   501759  8650751 1855285 12      23295   1302527 54788095        60030975        60030975        60030975
1611433513768270        1088338 87.2665721278802        434498551       2       1215    1639    3711    12927   501759  8650751 1855388 12      23295   1302527 54788095        60030975        60030975        60030975

Change-Id: I8077fb4f6b41fe4b2bd6c877af379ea7a9f415b1
Reviewed-on: http://gerrit.cloudera.org:8080/12570
Tested-by: Kudu Jenkins
Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>
Reviewed-by: Alexey Serbin <alexey@apache.org>
8 files changed