The meter system provides a functional analysis language called MAL (Meter Analysis Language) that lets users analyze and aggregate meter data in the OAP streaming system. The result of an expression can either be ingested by the agent analyzer, or the OC/Prometheus analyzer.
In MAL, an expression or sub-expression can evaluate to one of the following two types:
A set of samples, which acts as the basic unit in MAL. For example:
instance_trace_count
The sample family above may contain the following samples which are provided by external modules, such as the agent analyzer:
instance_trace_count{region="us-west",az="az-1"} 100 instance_trace_count{region="us-east",az="az-3"} 20 instance_trace_count{region="asia-north",az="az-1"} 33
MAL supports four type operations to filter samples in a sample family:
For example, this filters all instance_trace_count samples for us-west and asia-north region and az-1 az:
instance_trace_count.tagMatch("region", "us-west|asia-north").tagEqual("az", "az-1")
MAL supports six type operations to filter samples in a sample family by value:
For example, this filters all instance_trace_count samples for values >= 33:
instance_trace_count.valueGreaterEqual(33)
MAL allows tag manipulators to change (i.e. add/delete/update) tags and their values.
MAL supports using the metadata of K8s to manipulate the tags and their values. This feature requires authorizing the OAP Server to access K8s's API Server
.
retagByK8sMeta(newLabelName, K8sRetagType, existingLabelName, namespaceLabelName)
. Add a new tag to the sample family based on the value of an existing label. Provide several internal converting types, including
Add a tag to the sample using service
as the key, $serviceName.$namespace
as the value, and according to the given value of the tag key, which represents the name of a pod.
For example:
container_cpu_usage_seconds_total{namespace=default, container=my-nginx, cpu=total, pod=my-nginx-5dc4865748-mbczh} 2
Expression:
container_cpu_usage_seconds_total.retagByK8sMeta('service' , K8sRetagType.Pod2Service , 'pod' , 'namespace')
Output:
container_cpu_usage_seconds_total{namespace=default, container=my-nginx, cpu=total, pod=my-nginx-5dc4865748-mbczh, service='nginx-service.default'} 2
The following binary arithmetic operators are available in MAL:
Binary operators are defined between scalar/scalar, sampleFamily/scalar and sampleFamily/sampleFamily value pairs.
Between two scalars: they evaluate to another scalar that is the result of the operator being applied to both scalar operands:
1 + 2
Between a sample family and a scalar, the operator is applied to the value of every sample in the sample family. For example:
instance_trace_count + 2
or
2 + instance_trace_count
results in
instance_trace_count{region="us-west",az="az-1"} 102 // 100 + 2 instance_trace_count{region="us-east",az="az-3"} 22 // 20 + 2 instance_trace_count{region="asia-north",az="az-1"} 35 // 33 + 2
Between two sample families, a binary operator is applied to each sample in the sample family on the left and its matching sample in the sample family on the right. A new sample family with empty name will be generated. Only the matched tags will be reserved. Samples with no matching samples in the sample family on the right will not be found in the result.
Another sample family instance_trace_analysis_error_count
is
instance_trace_analysis_error_count{region="us-west",az="az-1"} 20 instance_trace_analysis_error_count{region="asia-north",az="az-1"} 11
Example expression:
instance_trace_analysis_error_count / instance_trace_count
This returns a resulting sample family containing the error rate of trace analysis. Samples with region us-west and az az-3 have no match and will not show up in the result:
{region="us-west",az="az-1"} 0.2 // 20 / 100 {region="asia-north",az="az-1"} 0.3333 // 11 / 33
Sample family supports the following aggregation operations that can be used to aggregate the samples of a single sample family, resulting in a new sample family having fewer samples (sometimes having just a single sample) with aggregated values:
These operations can be used to aggregate overall label dimensions or preserve distinct dimensions by inputting by
parameter( the keyword by
could be omitted)
<aggr-op>(by=[<tag1>, <tag2>, ...])
Example expression:
instance_trace_count.sum(by=['az'])
will output the following result:
instance_trace_count{az="az-1"} 133 // 100 + 33 instance_trace_count{az="az-3"} 20
Duration
is a textual representation of a time range. The formats accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} where a day is regarded as exactly 24 hours.
Examples:
increase(Duration)
: Calculates the increase in the time range.
rate(Duration)
: Calculates the per-second average rate of increase in the time range.
irate()
: Calculates the per-second instant rate of increase in the time range.
tag({allTags -> })
: Updates tags of samples. User can add, drop, rename and update tags.
histogram(le: '<the tag name of le>')
: Transforms less-based histogram buckets to meter system histogram buckets. le
parameter represents the tag name of the bucket.
histogram_percentile([<p scalar>])
. Represents the meter-system to calculate the p-percentile (0 ≤ p ≤ 100) from the buckets.
time()
: Returns the number of seconds since January 1, 1970 UTC.
forEach([string_array], Closure<Void> each)
: Iterates all samples according to the first array argument, and provide two parameters in the second closure argument:
element
: element in the array.tags
: tags in each sample.MAL should instruct meter-system on how to downsample for metrics. It doesn't only refer to aggregate raw samples to minute
level, but also expresses data from minute
in higher levels, such as hour
and day
.
Down sampling function is called downsampling
in MAL, and it accepts the following types:
The default type is AVG
.
If users want to get the latest time from last_server_state_sync_time_in_seconds
:
last_server_state_sync_time_in_seconds.tagEqual('production', 'catalog').downsampling(LATEST)
They extract level relevant labels from metric labels, then informs the meter-system the level and layer to which this metric belongs.
service([svc_label1, svc_label2...], Layer)
extracts service level labels from the array argument, extracts layer from Layer
argument.instance([svc_label1, svc_label2...], [ins_label1, ins_label2...], Layer, Closure<Map<String, String>> propertiesExtractor)
extracts service level labels from the first array argument, extracts instance level labels from the second array argument, extracts layer from Layer
argument, propertiesExtractor
is an optional closure that extracts instance properties from tags
, e.g. { tags -> ['pod': tags.pod, 'namespace': tags.namespace] }
.endpoint([svc_label1, svc_label2...], [ep_label1, ep_label2...])
extracts service level labels from the first array argument, extracts endpoint level labels from the second array argument, extracts layer from Layer
argument.serviceRelation(DetectPoint, [source_svc_label1...], [dest_svc_label1...], Layer)
DetectPoint including DetectPoint.CLIENT
and DetectPoint.SERVER
, extracts sourceService
labels from the first array argument, extracts destService
labels from the second array argument, extracts layer from Layer
argument.processRelation(detect_point_label, [service_label1...], [instance_label1...], source_process_id_label, dest_process_id_label, component_label)
extracts DetectPoint
labels from first argument, the label value should be client
or server
. extracts Service
labels from the first array argument, extracts Instance
labels from the second array argument, extracts ProcessID
labels from the fourth and fifth arguments of the source and destination.The OAP can load the configuration at bootstrap. If the new configuration is not well-formed, the OAP fails to start up. The files are located at $CLASSPATH/otel-rules
, $CLASSPATH/meter-analyzer-config
, $CLASSPATH/envoy-metrics-rules
and $CLASSPATH/zabbix-rules
.
The file is written in YAML format, defined by the scheme described below. Brackets indicate that a parameter is optional.
A full example can be found here
Generic placeholders are defined as follows:
<string>
: A regular string.<closure>
: A closure with custom logic.# initExp is the expression that initializes the current configuration file initExp: <string> # filter the metrics, only those metrics that satisfy this condition will be passed into the `metricsRules` below. filter: <closure> # example: '{ tags -> tags.job_name == "vm-monitoring" }' # expPrefix is executed before the metrics executes other functions. expPrefix: <string> # expSuffix is appended to all expression in this file. expSuffix: <string> # insert metricPrefix into metric name: <metricPrefix>_<raw_metric_name> metricPrefix: <string> # Metrics rule allow you to recompute queries. metricsRules: [ - <metric_rules> ]
# The name of rule, which combinates with a prefix 'meter_' as the index/table name in storage. name: <string> # MAL expression. exp: <string>
Please refer to OAP Self-Observability.