To accomplish this, you need to do the following:
Heron supports custom metric exporters from the Metrics Manager. You can either build your own Graphite metrics sink or use the provided Graphite sink.
In addition to the topology-specific data available from Heron, much more data is available directly from Aurora and the Linux kernel. These can help identify many operational problems, such as CPU throttling or crashes.
Diamond has the following relevant collectors available:
A convienent way to view topology-specific metrics in Graphite is to create a scripted dashboard in Grafana. The scripted dashboard should accept information such as the topology name as query arguments, which will allow the Heron UI to deep link to a specific topology's dashboard.
Suggested targets for the scripted dashboard include:
'aliasByNode(sortByMaxima(highestAverage(heron.' + topology_name + '.stmgr.stmgr-*. time_spent_back_pressure_by_compid.*, 5)), 5)'
Fail Count by Component:
'sumSeriesWithWildcards(aliasByNode(heron.' + topology_name + '.*.*.fail-count.default,2),3)'`
CPU Throttling periods:
aliasByNode(nonNegativeDerivative(mesos.tasks.prod.*.' + topology_name + '.*.cpu. nr_throttled), 4,5)
'aliasByNode(drawAsInfinite(maximumAbove(removeAboveValue(heron.' + topology_name + '.*.*.jvm. uptime-secs, 60),1)),2,3)'
Top 5 worst GC components:
'aliasByNode(highestMax(nonNegativeDerivative(heron.' + topology_name + '.*.*.jvm.gc-time-ms. PS-*),5), 2,3,6)'
Finally, you can configure the Heron UI to deep link to scripted dashboards by specifying an [observability URL format] (https://github.com/apache/incubator-heron/blob/master/heron/tools/config/src/yaml/tracker/heron_tracker.yaml) (
viz.url.format) in the Heron Tracker's configuration. This will add topology-specific buttons to the Heron UI enabling you to drill-down into your Grafana dashboards.