apache /
kudu /
e5ace5fa28154fa906c1b087e3b80461b6d85337 KUDU-2353: tool to parse metrics out of diagnostic logs
This patch contains C++ implementation of the metrics log parser script.
There are a couple functional differences between this tool and the
existing script:
- This tool recognizes table metrics
- This tool allows for filtering metrics by table or tablet
identifier
- Histogram metrics for this tool also spit out the rough count of the
measurements
This patch also addresses KUDU-2597 as a subtask of the JIRA item
mentioned in the summary.
NOTES:
- Kudu metrics are only output into the diagnostic log when a metric
has changed, so this patch tracks the metric values per entity
(tablet ID, table ID, server) at each point in time in order
to output the correct values. This means that if within a given set
of files, a tablet's metric has not changed and no corresponding
records are in the files, this tool is not printing any information
on the tablet's metrics.
- Kudu histogram metrics do spit out a summary for percentiles. The
tool explicitly does not use that and instead generates these
metrics from the histogram counts. While less accurate (IIUC, the
counts can be lossy), this allows us to generate aggregated
summaries from multiple entities.
Here's an example:
[awong@va1022 release]$ ./bin/kudu diagnose parse_metrics kudu-tserver.worker12.foobar.com.kudu.diagnostics.20210123-201217.0.74565 --simple_metrics=tablet.scans_started:num_scans_started --rate_metrics=tablet.scans_started:scans_started_per_sec --histogram_metrics=server.scanner_duration:scanner_duration_
us,server.handler_latency_kudu_tserver_TabletServerService_Scan:scan_rpc_us
I0131 11:53:27.010298 151768 diagnostics_log_parser.cc:272] collecting simple metric tablet.scans_started as num_scanners_started
I0131 11:53:27.010438 151768 diagnostics_log_parser.cc:279] collecting rate metric tablet.scans_started as scanners_started_per_sec
I0131 11:53:27.010455 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.handler_latency_kudu_tserver_TabletServerService_Scan as scan_rpc_us
I0131 11:53:27.010524 151768 diagnostics_log_parser.cc:286] collecting histogram metric server.scanner_duration as scanner_duration_us
timestamp num_scanners_started scanners_started_per_sec scan_rpc_us_count scan_rpc_us_min scan_rpc_us_p50 scan_rpc_us_p75 scan_rpc_us_p95 scan_rpc_us_p99 scan_rpc_us_p99_99 scan_rpc_us_max scanner_duration_us_count scanner_duration_us_min scanner_duration_us_p50 scanner_duration_us_p75 scanner_duration_us_p95 scanner_duration_us_p99 scanner_duration_us_p99_99 scanner_duration_us_max
1611432793767488 68492 0 434170147 2 1215 1639 3711 12927 501759 8650751 1854125 12 23295 1302527 54788095 60030975 60030975 60030975
1611432853767552 231516 2717.0637684653134 434198546 2 1215 1639 3711 12927 501759 8650751 1854200 12 23295 1302527 54788095 60030975 60030975 60030975
1611432913767616 349073 1959.2812434333403 434227285 2 1215 1639 3711 12927 501759 8650751 1854306 12 23295 1302527 54788095 60030975 60030975 60030975
1611432973767689 829597 8008.7235893863 434255021 2 1215 1639 3711 12927 501759 8650751 1854517 12 23295 1302527 54788095 60030975 60030975 60030975
1611433033767772 926516 1615.314432148369 434283184 2 1215 1639 3711 12927 501759 8650751 1854605 12 23295 1302527 54788095 60030975 60030975 60030975
1611433093767841 926626 1.8333312250024245 434309627 2 1215 1639 3711 12927 501759 8650751 1854719 12 23295 1302527 54788095 60030975 60030975 60030975
1611433153767902 960053 557.11610026529809 434339928 2 1215 1639 3711 12927 501759 8650751 1854788 12 23295 1302527 54788095 60030975 60030975 60030975
1611433213767967 1009625 826.19910495096963 434366776 2 1215 1639 3711 12927 501759 8650751 1854831 12 23295 1302527 54788095 60030975 60030975 60030975
1611433273768032 1059960 838.91575784126235 434394555 2 1215 1639 3711 12927 501759 8650751 1854966 12 23295 1302527 54788095 60030975 60030975 60030975
1611433333768067 1061577 26.949984279175837 434420683 2 1215 1639 3711 12927 501759 8650751 1855023 12 23295 1302527 54788095 60030975 60030975 60030975
1611433393768130 1082096 341.98297425121041 434447991 2 1215 1639 3711 12927 501759 8650751 1855185 12 23295 1302527 54788095 60030975 60030975 60030975
1611433453768205 1083102 16.76664570835953 434476348 2 1215 1639 3711 12927 501759 8650751 1855285 12 23295 1302527 54788095 60030975 60030975 60030975
1611433513768270 1088338 87.2665721278802 434498551 2 1215 1639 3711 12927 501759 8650751 1855388 12 23295 1302527 54788095 60030975 60030975 60030975
Change-Id: I8077fb4f6b41fe4b2bd6c877af379ea7a9f415b1
Reviewed-on: http://gerrit.cloudera.org:8080/12570
Tested-by: Kudu Jenkins
Reviewed-by: Abhishek Chennaka <achennaka@cloudera.com>
Reviewed-by: Alexey Serbin <alexey@apache.org>
8 files changed