In large clusters (500+ nodes), sometimes there are performance issues seen in AMS aggregations. In the ambari-metrics-collector log file, we can see log lines that look like
20:51:30,952 INFO 2080712366@qtp-974606690-381 AsyncProcess:1597 - #1, waiting for 13948 actions to finish 20:51:31,601 INFO 1279097595@qtp-974606690-359 AsyncProcess:1597 - #1, waiting for 19376 actions to finish
In Ambari 3.0.0, we are tackling these performance issues through a complete schema and aggregation logic revamp. Until then, we can use AMS whitelisting to reduce the number of metrics tracked by AMS, there by solving this scale problem.
Until Ambari 2.4.3 A metric whitelist file can be used to track the set of metrics in AMS. All other metrics will be discarded.
STEPS
From Ambari 2.5.0 onwards From Ambari 2.5.0, more refinements for whitelisting were included.
App Blacklisting - Blacklist metrics from one or more services. Other service metrics will be entirely allowed or controlled through a whitelist file.
ams-site : timeline.metrics.apps.blacklist = hbase,namenode
App Whitelisting - Whitelist metrics from one or more services.
ams-site:timeline.metrics.apps.whitelist = nimbus,datanode
NOTE : The App name can be found from the metadata URL :
http:<metrics_collector_host>:6188/ws/v1/timeline/metrics/metadata
Metric Whitelisting - Same as the whitelisting method in Ambari 2.4.3 (through a whitelist file). In addition to supplying metric names in the whitelist file, patterns can also be supplied using the .p perfix. For example, a pattern can be specified as follows
._p_dfs.FSNamesystem.*
._p_jvm.JvmMetrics*
An example of a metric whitelisting file that has both metrics and patterns - https://github.com/apache/ambari/blob/trunk/ambari-metrics/ambari-metrics-timelineservice/src/test/resources/test_data/metric_whitelist.dat.
These whitelisting/blacklisting techniques can be used together.