The alerting of IoTDB is expected to support two modes:
Writing triggered: the user writes data to the original time series, and every time a piece of data is inserted, the judgment logic of trigger
will be triggered. If the alerting requirements are met, an alert is sent to the data sink, The data sink then forwards the alert to the external terminal.
Continuous query: the user writes data to the original time series, ContinousQuery
periodically queries the original time series, and writes the query results into the new time series, Each write triggers the judgment logic of trigger
, If the alerting requirements are met, an alert is sent to the data sink, The data sink then forwards the alert to the external terminal.
With the introduction of the trigger
module and the sink
module into IoTDB, at present, users can use these two modules with AlertManager
to realize the writing triggered alerting mode.
The pre-compiled binary file can be downloaded here.
Running command:
./alertmanager --config.file=<your_file>
Available at Quay.io or Docker Hub.
Running command:
docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager
The following is an example, which can cover most of the configuration rules. For detailed configuration rules, see here.
Example:
# alertmanager.yml global: # The smarthost and SMTP sender used for mail notifications. smtp_smarthost: 'localhost:25' smtp_from: 'alertmanager@example.org' # The root route on which each incoming alert enters. route: # The root route must not have any matchers as it is the entry point for # all alerts. It needs to have a receiver configured so alerts that do not # match any of the sub-routes are sent to someone. receiver: 'team-X-mails' # The labels by which incoming alerts are grouped together. For example, # multiple alerts coming in for cluster=A and alertname=LatencyHigh would # be batched into a single group. # # To aggregate by all possible labels use '...' as the sole label name. # This effectively disables aggregation entirely, passing through all # alerts as-is. This is unlikely to be what you want, unless you have # a very low alert volume or your upstream notification system performs # its own grouping. Example: group_by: [...] group_by: ['alertname', 'cluster'] # When a new group of alerts is created by an incoming alert, wait at # least 'group_wait' to send the initial notification. # This way ensures that you get multiple alerts for the same group that start # firing shortly after another are batched together on the first # notification. group_wait: 30s # When the first notification was sent, wait 'group_interval' to send a batch # of new alerts that started firing for that group. group_interval: 5m # If an alert has successfully been sent, wait 'repeat_interval' to # resend them. repeat_interval: 3h # All the above attributes are inherited by all child routes and can # overwritten on each. # The child route trees. routes: # This routes performs a regular expression match on alert labels to # catch alerts that are related to a list of services. - match_re: service: ^(foo1|foo2|baz)$ receiver: team-X-mails # The service has a sub-route for critical alerts, any alerts # that do not match, i.e. severity != critical, fall-back to the # parent node and are sent to 'team-X-mails' routes: - match: severity: critical receiver: team-X-pager - match: service: files receiver: team-Y-mails routes: - match: severity: critical receiver: team-Y-pager # This route handles all alerts coming from a database service. If there's # no team to handle it, it defaults to the DB team. - match: service: database receiver: team-DB-pager # Also group alerts by affected database. group_by: [alertname, cluster, database] routes: - match: owner: team-X receiver: team-X-pager - match: owner: team-Y receiver: team-Y-pager # Inhibition rules allow to mute a set of alerts given that another alert is # firing. # We use this to mute any warning-level notifications if the same alert is # already critical. inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' # Apply inhibition if the alertname is the same. # CAUTION: # If all label names listed in `equal` are missing # from both the source and target alerts, # the inhibition rule will apply! equal: ['alertname'] receivers: - name: 'team-X-mails' email_configs: - to: 'team-X+alerts@example.org, team-Y+alerts@example.org' - name: 'team-X-pager' email_configs: - to: 'team-X+alerts-critical@example.org' pagerduty_configs: - routing_key: <team-X-key> - name: 'team-Y-mails' email_configs: - to: 'team-Y+alerts@example.org' - name: 'team-Y-pager' pagerduty_configs: - routing_key: <team-Y-key> - name: 'team-DB-pager' pagerduty_configs: - routing_key: <team-DB-key>
In the following example, we used the following configuration:
# alertmanager.yml global: smtp_smarthost: '' smtp_from: '' smtp_auth_username: '' smtp_auth_password: '' smtp_require_tls: false route: group_by: ['alertname'] group_wait: 1m group_interval: 10m repeat_interval: 10h receiver: 'email' receivers: - name: 'email' email_configs: - to: '' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname']
The AlertManager
API is divided into two versions, v1
and v2
. The current AlertManager
API version is v2
(For configuration see api/v2/openapi.yaml).
By default, the prefix is /api/v1
or /api/v2
and the endpoint for sending alerts is /api/v1/alerts
or /api/v2/alerts
. If the user specifies --web.route-prefix
, for example --web.route-prefix=/alertmanager/
, then the prefix will become /alertmanager/api/v1
or /alertmanager/api/v2
, and the endpoint that sends the alert becomes /alertmanager/api/v1/alerts
or /alertmanager/api/v2/alerts
.
The user defines a trigger by creating a Java class and writing the logic in the hook. Please refer to Triggers for the specific configuration process and the usage method of AlertManagerSink
related tools provided by the Sink module.
The following example creates the org.apache.iotdb.trigger.AlertingExample
class, Its alertManagerHandler member variables can send alerts to the AlertManager instance at the address of http://127.0.0.1:9093/
.
When value> 100.0
, send an alert of critical
severity; when 50.0 <value <= 100.0
, send an alert of warning
severity .
package org.apache.iotdb.trigger; /* package importing is omitted here */ public class AlertingExample implements Trigger { private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); private final AlertManagerConfiguration alertManagerConfiguration = new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); private String alertname; private final HashMap<String, String> labels = new HashMap<>(); private final HashMap<String, String> annotations = new HashMap<>(); @Override public void onCreate(TriggerAttributes attributes) throws Exception { alertManagerHandler.open(alertManagerConfiguration); alertname = "alert_test"; labels.put("series", "root.ln.wf01.wt01.temperature"); labels.put("value", ""); labels.put("severity", ""); annotations.put("summary", "high temperature"); annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); } @Override public void onDrop() throws IOException { alertManagerHandler.close(); } @Override public void onStart() { alertManagerHandler.open(alertManagerConfiguration); } @Override public void onStop() throws Exception { alertManagerHandler.close(); } @Override public Double fire(long timestamp, Double value) throws Exception { if (value > 100.0) { labels.put("value", String.valueOf(value)); labels.put("severity", "critical"); AlertManagerEvent alertManagerEvent = new AlertManagerEvent(alertname, labels, annotations); alertManagerHandler.onEvent(alertManagerEvent); } else if (value > 50.0) { labels.put("value", String.valueOf(value)); labels.put("severity", "warning"); AlertManagerEvent alertManagerEvent = new AlertManagerEvent(alertname, labels, annotations); alertManagerHandler.onEvent(alertManagerEvent); } return value; } @Override public double[] fire(long[] timestamps, double[] values) throws Exception { for (double value : values) { if (value > 100.0) { labels.put("value", String.valueOf(value)); labels.put("severity", "critical"); AlertManagerEvent alertManagerEvent = new AlertManagerEvent(alertname, labels, annotations); alertManagerHandler.onEvent(alertManagerEvent); } else if (value > 50.0) { labels.put("value", String.valueOf(value)); labels.put("severity", "warning"); AlertManagerEvent alertManagerEvent = new AlertManagerEvent(alertname, labels, annotations); alertManagerHandler.onEvent(alertManagerEvent); } } return values; } }
The following SQL statement registered the trigger named root-ln-wf01-wt01-alert
on the root.ln.wf01.wt01.temperature
time series, whose operation logic is defined by org.apache.iotdb.trigger.AlertingExample
java class.
CREATE TRIGGER `root-ln-wf01-wt01-alert` AFTER INSERT ON root.ln.wf01.wt01.temperature AS "org.apache.iotdb.trigger.AlertingExample"
When we finish the deployment and startup of AlertManager as well as the creation of Trigger, we can test the alerting by writing data to the time series.
INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (1, 0); INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (2, 30); INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (3, 60); INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (4, 90); INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (5, 120);
After executing the above writing statements, we can receive an alerting email. Because our AlertManager
configuration above makes alerts of critical
severity inhibit those of warning
severity, the alerting email we receive only contains the alert triggered by the writing of (5, 120)
.