All Heron Tracker endpoints return a JSON object with the following information:
status
--- One of the following: success
, failure
.executiontime
--- The time taken to return the HTTP result, in seconds.message
--- Some endpoints return special messages in this field for certain requests. Often, this field will be an empty string. A failure
status will always have a message.result
--- The result payload of the request. The contents will depend on the endpoint.version
--- The Tracker API version./
(redirects to /topologies
)/clusters
/topologies
/topologies/states
/topologies/info
/topologies/logicalplan
/topologies/physicalplan
/topologies/executionstate
/topologies/schedulerlocation
/topologies/metrics
/topologies/metricstimeline
/topologies/metricsquery
/topologies/containerfiledata
/topologies/containerfilestats
/topologies/exceptions
/topologies/exceptionsummary
/topologies/pid
/topologies/jstack
/topologies/jmap
/topologies/histo
/machines
All of these endpoints are documented in the sections below.
Returns JSON list of all the clusters.
Returns JSON describing all currently available topologies
$ curl "http://heron-tracker-url/topologies?cluster=cluster1&environ=devel"
cluster
(optional) --- The cluster parameter can be used to filter topologies that are running in this cluster.environ
(optional) --- The environment parameter can be used to filter topologies that are running in this environment.Returns a JSON representation of the logical plan of a topology.
$ curl "http://heron-tracker-url/topologies/logicalplan?cluster=cluster1&environ=devel&topology=topologyName"
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyThe resulting JSON contains the following
spouts
--- A set of JSON objects representing each spout in the topology. The following information is listed for each spout:source
--- The source of tuples for the spout.type
--- The type of the spout, e.g. kafka
, kestrel
, etc.outputs
--- A list of streams to which the spout outputs tuples.bolts
--- A set of JSON objects representing each bolt in the topology.outputs
--- A list of streams to which the bolt outputs tuples.inputs
--- A list of inputs for the bolt. An input is represented by JSON dictionary containing following information.component_name
--- Name of the component this bolt is receiving tuples from.stream_name
--- Name of the stream from which the tuples are received.grouping
--- Type of grouping used to receive tuples, example SHUFFLE
or FIELDS
.Returns a JSON representation of the physical plan of a topology.
$ curl "http://heron-tracker-url/topologies/physicalplan?cluster=datacenter1&environ=prod&topology=topologyName"
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyThe resulting JSON contains following information
stmgrs
--- A list of JSON dictionary, containing following information of each stream manager.host
--- Hostname of the machine this container is running on.pid
--- Process ID of the stream manager.cwd
--- Absolute path to the directory from where container was launched.joburl
--- URL to browse the cwd
through heron-shell
.shell_port
--- Port to access heron-shell
.logfiles
--- URL to browse instance log files through heron-shell
.id
--- ID for this stream manager.port
--- Port at which this stream manager accepts connections from other stream managers.instance_ids
--- List of instance IDs that constitute this container.instances
--- A list of JSON dictionaries containing following information for each instanceid
--- Instance ID.name
--- Component name of this instance.logfile
--- Link to log file for this instance, that can be read through heron-shell
.stmgrId
--- Its stream manager's ID.config
--- Various topology configs. Some of the examples are:topology.message.timeout.secs
--- Time after which a tuple should be considered as failed.topology.acking
--- Whether acking is enabled or not.Returns a JSON representation of the scheduler location of the topology.
$ curl "http://heron-tracker-url/topologies/schedulerlocation?cluster=datacenter1&environ=prod&topology=topologyName"
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyThe SchedulerLocation
mainly contains the link to the job on the scheduler, for example, the Aurora page for the job.
Returns a JSON representation of the execution state of the topology.
$ curl "http://heron-tracker-url/topologies/executionstate?cluster=datacenter1&environ=prod&topology=topologyName"
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyEach execution state object lists the following:
cluster
--- The cluster in which the topology is runningenviron
--- The environment in which the topology is runningrole
--- The role with which the topology was launchedjobname
--- Same as topology namesubmission_time
--- The time at which the topology was submittedsubmission_user
--- The user that submitted the topology (can be same as role
)release_username
--- The user that generated the Heron release for the topologyrelease_version
--- Release versionhas_physical_plan
--- Whether the topology has a physical planhas_tmaster_location
--- Whether the topology has a Topology Master Locationhas_scheduler_location
--- Whether the topology has a Scheduler Locationviz
--- Metric visualization UI URL for the topology if it was configuredReturns a JSON list of execution states of topologies in all the cluster.
$ curl "http://heron-tracker-url/topologies/states?cluster=cluster1&environ=devel"
cluster
(optional) --- The cluster parameter can be used to filter topologies that are running in this cluster.environ
(optional) --- The environment parameter can be used to filter topologies that are running in this environment.Returns a JSON representation of a dictionary containing logical plan, physical plan, execution state, scheduler location and TMaster location for a topology, as described above. TMasterLocation
is the location of the TMaster, including its host, port, and the heron-shell port that it exposes.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyReturns the file stats for a container. This is the output of the command ls -lh
when run in the directory where the heron-controller launched all the processes.
This endpoint is mainly used by ui for exploring files in a container.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycontainer
(required) --- Container IDpath
(optional) --- Path relative to the directory where heron-controller is launched. Paths are not allowed to start with a /
or contain a ..
.Returns the file data for a file of a container.
This endpoint is mainly used by ui for exploring files in a container.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycontainer
(required) --- Container IDpath
(required) --- Path to the file relative to the directory where heron-controller is launched. Paths are not allowed to start with a /
or contain a ..
.offset
(required) --- Offset from the beggining of the file.length
(required) --- Number of bytes to be returned.Returns a JSON map of instances of the topology to their respective metrics. To filter instances returned use the instance
parameter discussed below.
Note that these metrics come from TMaster, which only holds metrics for last 3 hours minutely data, as well as cumulative values. If the interval
is greater than 10800
seconds, the values will be for all-time metrics.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycomponent
(required) --- Component namemetricname
(required, repeated) --- Names of metrics to fetchinterval
(optional) --- For how many seconds, the metrics should be fetched for (max 10800 seconds)instance
(optional) --- IDs of the instances. If not present, return for all the instances.Returns a JSON map of instances of the topology to their respective metrics timeline. To filter instances returned use the instance
parameter discussed below.
The difference between this and /metrics
endpoint above, is that /metrics
will report cumulative value over the period of interval
provided. On the other hand, /metricstimeline
endpoint will report minutely values for each metricname for each instance.
Note that these metrics come from TMaster, which only holds metrics for last 3 hours minutely data, as well as cumulative all-time values. If the starttime is older than 3 hours ago, those minutes would not be part of the response.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycomponent
(required) --- Component namemetricname
(required, repeated) --- Names of metrics to fetchstarttime
(required) --- Start time for the metrics (must be within last 3 hours)endtime
(required) --- End time for the metrics (must be within last 3 hours, and greater than starttime
)instance
(optional) --- IDs of the instances. If not present, return for all the instances.Executes the metrics query for the topology and returns the result in form of minutely timeseries. A detailed description of query language is given below.
Note that these metrics come from TMaster, which only holds metrics for last 3 hours minutely data, as well as cumulative all-time values. If the starttime is older than 3 hours ago, those minutes would not be part of the response.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologystarttime
(required) --- Start time for the metrics (must be within last 3 hours)endtime
(required) --- End time for the metrics (must be within last 3 hours, and greater than starttime
)query
(required) --- Query to be executedReturns summary of the exceptions for the component of the topology. Duplicated exceptions are combined together and includes the number of occurances, first occurance time and latest occurance time.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycomponent
(required) --- Component nameinstance
(optional) --- IDs of the instances. If not present, return for all the instances.Returns all exceptions for the component of the topology.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologycomponent
(required) --- Component nameinstance
(optional) --- IDs of the instances. If not present, return for all the instances.Returns the PID of the instance jvm process.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyinstance
(required) --- Instance IDReturns the thread dump of the instance jvm process.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyinstance
(required) --- Instance IDIssues the jmap
command for the instance, and saves the result in a file. Returns the path to the file that can be downloaded externally.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyinstance
(required) --- Instance IDReturns histogram for the instance jvm process.
cluster
(required) --- The cluster in which the topology is runningenviron
(required) --- The environment in which the topology is runningtopology
(required) --- The name of the topologyinstance
(required) --- Instance IDReturns JSON describing all machines that topologies are running on.
$ curl "http://heron-tracker-url/machines?topology=mytopology1&cluster=cluster1&environ=prod"
cluster
(optional) --- The cluster parameter can be used to filter machines that are running the topologies in this cluster only.environ
(optional) --- The environment parameter can be used to filter machines that are running the topologies in this environment only.topology
(optional, repeated) --- Name of the topology. Both cluster
and environ
are required if the topology
parameter is presentMetrics queries are useful when some kind of aggregated values are required. For example, to find the total number of tuples emitted by a spout, SUM
operator can be used, instead of fetching metrics for all the instances of the corresponding component, and then summing them.
TS(componentName, instance, metricName)
Example:
TS(component1, *, __emit-count/stream1)
Time Series Operator. This is the basic operator that is responsible for getting metrics from TMaster. Accepts a list of 3 elements:
Returns a univariate time series in case of a single instance id given, otherwise returns a multivariate time series.
DEFAULT(0, TS(component1, *, __emit-count/stream1))
If the second operator returns more than one timeline, so will the DEFAULT operator.
DEFAULT(100.0, SUM(TS(component2, *, __emit-count/default))) <--
Second operator can be any operator
Default Operator. This operator is responsible for filling missing values in the metrics timeline. Must have 2 arguments
Returns a univariate or multivariate time series, based on what the second operator is.
SUM(TS(component1, instance1, metric1), DEFAULT(0, TS(component1, *, metric2)))
Sum Operator. This operator is used to take sum of all argument time series. It can have any number of arguments, each of which must be one of the following two types:
Returns only a single timeline representing the sum of all time series for each timestamp. Note that “instance” attribute is not there in the result.
MAX(100, TS(component1, *, metric1))
Max Operator. This operator is used to find max of all argument operators for each individual timestamp. Each argument must be one of the following types:
Returns only a single timeline representing the max of all the time series for each timestamp. Note that “instance” attribute is not included in the result.
PERCENTILE(99, TS(component1, *, metric1))
Percentile Operator. This operator is used to find a quantile of all timelines retuned by the arguments, for each timestamp. This is a more general type of query similar to MAX. Note that PERCENTILE(100, TS...)
is equivalent to Max(TS...)
. Each argument must be either constant or Operators. First argument must always be the required Quantile.
Returns only a single timeline representing the quantile of all the time series for each timestamp. Note that “instance” attribute is not there in the result.
DIVIDE(TS(component1, *, metrics1), 100)
Divide Operator. Accepts two arguments, both can be univariate or multivariate. Each can be of one of the following types:
Three main cases are:
MULTIPLY(10, TS(component1, *, metrics1))
Multiply Operator. Has same conditions as division operator. This is to keep the API simple. Accepts two arguments, both can be univariate or multivariate. Each can be of one of the following types:
Three main cases are:
SUBTRACT(TS(component1, instance1, metrics1), TS(componet1, instance1, metrics2)) SUBTRACT(TS(component1, instance1, metrics1), 100)
Subtract Operator. Has same conditions as division operator. This is to keep the API simple. Accepts two arguments, both can be univariate or multivariate. Each can be of one of the following types:
Three main cases are:
RATE(SUM(TS(component1, *, metrics1))) RATE(TS(component1, *, metrics2))
Rate Operator. This operator is used to find rate of change for all timeseries. Accepts a only a single argument, which must be an Operators which returns univariate or multivariate time series. Returns univariate or multivariate time series based on the argument, with each timestamp value corresponding to the rate of change for that timestamp.