The Heron Tracker is a web service that continuously gathers a wide variety of information about Heron topologies in your cluster(s) and exposes that information through a JSON REST API. More on the role of the Tracker can be found here.
The Tracker can run within your Heron cluster (e.g. Mesos or Aurora) or outside of it, provided that the machine on which it runs has access to your Heron cluster.
You can start the Heron Tracker by running the heron-tracker
executable, which you can generate when you compile Heron.
$ cd /path/to/heron/binaries $ ./heron-tracker
By default, the Tracker runs on port 8888. You can specify a different port using the --port
flag:
$ ./heron-tracker --port=1234
All Heron Tracker endpoints return a JSON object with the following information:
status
--- One of the following: success
, failure
.executiontime
--- The time it took to return the HTTP result, in seconds.message
--- Some endpoints return special messages in this field for certain requests. Often, this field will be an empty string.result
--- The result payload of the request. The contents will depend on the endpoint.version
--- The Heron release version used to build the currently running Tracker executable./
(redirects to /topologies
)/machines
/topologies
/topologies/states
/topologies/info
/topologies/logicalplan
/topologies/physicalplan
/topologies/executionstate
/topologies/metrics
/topologies/metricstimeline
/topologies/metricsquery
/topologies/exceptionsummary
/topologies/pid
/topologies/jstack
/topologies/jmap
/topologies/histo
All of these endpoints are documented in the sections below.
/machines
Returns JSON describing all currently available machines, sorted by (1) data center (if you're running Heron in multiple data centers), (2) environment, and (3) topology.
$ curl http://heron-tracker-url/machines
dc
--- The data center. If the data center you provide is valid, the JSON payload will list machines only in that data center. You will receive a 404 if the data center is invalid. Example:
$ curl "http://heron-tracker-url/machines?dc=datacenter1"
environ
--- The environment. Must be either devel
or prod
, otherwise you will receive a 404. Example:
$ curl "http://heron-tracker-url/machines?environ=devel"
topology
(repeated) --- Both dc
and environ
are required if the topology
parameter is present
$ curl "http://heron-tracker-url/machines?topology=mytopology1&dc=datacenter1&environ=prod"
The value of the result
field should look something like this:
{ <dc1>: { <environ1>: { <topology1>: [machine1, machine2, ...], <topology2>: [...], }, <environ2> : {...}, ... }, <dc2>: {...} }
/topologies
Returns JSON describing all currently available topologies
dc
--- The data center. If the data center you provide is valid, the JSON payload will list topologies only in that data center. You will receive a 404 if the data center is invalid. Example:
$ curl "http://heron-tracker-url/topologies?dc=datacenter1"
environ
--- Lists topologies by the environment in which they're running. Example:
$ curl "http://heron-tracker-url/topologies?environ=prod"
The value of the result
field should look something like this:
{ <dc1>: { <environ1>: [ topology1, topology2, ... ], <environ2>: [...], }, <dc2>: {...} }
/topologies/states
The current execution state of topologies in a cluster. Topologies can be grouped by data center, environment, or both.
dc
--- The data center. If the data center you provide is valid, the JSON payload will list topologies only in that data center. You will receive a 404 if the data center is invalid. Example:
$ curl "http://heron-tracker-url/topologies/states?dc=datacenter1"
environ
--- Lists topologies by the environment in which they're running. Example:
$ curl "http://heron-tracker-url/topologies/states?environ=prod"
The value of the result
field should look something like this:
{ <dc1>: { <environ1>: { <topology1>: { <execution state> }, <topology2>: {...}, ... }, <environ2>: {...{, ... <dc2>: {...} }
Each execution state object lists the following:
release_username
--- The user that generated the Heron release for the topologyhas_tmaster_location
--- Whether the topology's Topology Master currently has a locationrelease_tag
--- This is a legacyuploader_version
--- TODOdc
--- The data center in which the topology is runningjobname
--- TODOrelease_version
--- TODOenviron
--- The environment in which the topology is runningsubmission_user
--- The user that submitted the topologysubmission_time
--- The time at which the topology was submitted (timestamp in milliseconds)role
--- TODOhas_physical_plan
--- Whether the topology currently has a physical plan/topologies/info
dc
--- The data center in which the topology is runningenviron
--- The environment in which the topology is runningtopology
--- The name of the topology$ curl "http://heron-tracker-url/topologies/info?dc=datacenter1&environ=prod&topology=user_topology_1"
The value of the result
field should lists the following:
name
--- The name of the topologytmaster_location
--- Information about the machine on which the topology's Topology Master (TM) is running, including the following: the controller port, the host, the master port, the stats port, and the ID of the TM.physical_plan
--- A JSON representation of the physical plan of the topology, which includes configuration information for the topology as well as information about all current spouts, bolts, state managers, and instances.logical_plan
--- A JSON representation of the logical plan of the topology, which includes information about all of the spouts and bolts in the topology.execution_state
--- The execution state of the topology. For more on execution state, see the section regarding the /topologies/states
endpoint above./topologies/logicalplan
Returns a JSON object for the logical plan of a topology.
dc
--- The data center in which the topology is runningenviron
--- The environment in which the topology is runningtopology
--- The name of the topology$ curl "http://heron-tracker-url/topologies/logicalplan?dc=datacenter1&environ=prod&topology=user_topology_1"
The value of the result
field should look something like this:
TODO
spouts
--- A set of JSON objects representing each spout in the topology. The following information is listed for each spout:source
--- The source of tuples for the spout.version
--- The Heron release version for the topology.type
--- The type of the spout, e.g. kafka
, kestrel
, etc.outputs
--- A list of streams to which the spout outputs tuples.bolts
--- A set of JSON objects representing each bolt in the topology.outputs
--- A list of outputs for the bolt.inputs
--- A list of inputs for the bolt./topologies/physicalplan
Returns a JSON object for the physical plan of a topology.
dc
--- The data center in which the topology is runningenviron
--- The environmenttopology
--- The name of the topology$ curl "http://heron-tracker-url/topologies/physicalplan?dc=datacenter1&environ=prod&topology=user_topology_1"
/topologies/executionstate
The current execution state of a given topology.
dc
--- The data center in which the topology is runningenviron
--- The environment in which the topology is runningtopology
--- The name of the topology$ curl "http://heron-tracker-url/topologies/executionstate?dc=datacenter1&environ=prod&topology=user_topology_1"
The value of the result
field will be a JSON object akin to the one documented in a section above.
/topologies/metrics
/topologies/metricstimeline
/topologies/metricsquery
/topologies/exceptionsummary
/topologies/pid
/topologies/jstack
/topologies/jmap
dc
--- The data center in which the topology is runningenviron
--- The environment in which the topology is runningtopology
--- The name of the topologyinstance
--- The instance ID of the desired Heron instance/topologies/histo
Returns JSON containing a histogram
dc
--- The data centerenviron
--- The environmenttopology
--- The name of the topologyinstance
--- The instance ID of the desired Heron instanceThe result
field should look something like this:
{ "command": "<command executed at server>", "stdout": "<text from stdout from executing the command>", "stderr": "<text from stderr from executing the command>" }