Provide OAL(Observability Analysis Language) to analysis incoming data in streaming mode.
OAL focuses on metrics in Service, Service Instance and Endpoint. Because of that, the language is easy to learn and use.
Since 6.3, the OAL engine is embedded in OAP server runtime, as
oal-rt(OAL Runtime). OAL scripts now locate in
/config folder, user could simply change and reboot the server to make it effective. But still, OAL script is compile language, OAL Runtime generates java codes dynamically.
You could open set
SW_OAL_ENGINE_DEBUG=Y at system env, to see which classes generated.
Scripts should be named as
// Declare the metrics. METRICS_NAME = from(SCOPE.(* | [FIELD][,FIELD ...])) [.filter(FIELD OP [INT | STRING])] .FUNCTION([PARAM][, PARAM ...]) // Disable hard code disable(METRICS_NAME);
Primary SCOPEs are
EndpointRelation. Also there are some secondary scopes, which belongs to one primary scope.
Read Scope Definitions, you can find all existing Scopes and Fields.
Use filter to build the conditions for the value of fields, by using field name and expression.
The expressions support to link by
(...). The OPs support
in [...] ,
like ...% and
like %...%, with type detection based of field type. Trigger compile or code generation error if incompatible.
The default functions are provided by SkyWalking OAP core, and could implement more.
longAvg. The avg of all input per scope entity. The input field must be a long.
instance_jvm_memory_max = from(ServiceInstanceJVMMemory.max).longAvg();
In this case, input are request of each ServiceInstanceJVMMemory scope, avg is based on field
doubleAvg. The avg of all input per scope entity. The input field must be a double.
instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg();
In this case, input are request of each ServiceInstanceJVMCPU scope, avg is based on field
percent. The number or ratio expressed as a fraction of 100, for the condition matched input.
endpoint_percent = from(Endpoint.*).percent(status == true);
In this case, all input are requests of each endpoint, condition is
endpoint.status == true.
rate. The rate expressed as a fraction of 100, for the condition matched input.
browser_app_error_rate = from(BrowserAppTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR, trafficCategory == BrowserAppTrafficCategory.NORMAL);
In this case, all input are requests of each browser app traffic,
numerator condition is
trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR and
denominator condition is
trafficCategory == BrowserAppTrafficCategory.NORMAL. The parameter (1) is the
numerator condition. The parameter (2) is the
count. The sum calls per scope entity.
service_calls_sum = from(Service.*).count();
In this case, calls of each service.
histogram. Read Heatmap in WIKI
all_heatmap = from(All.latency).histogram(100, 20);
In this case, thermodynamic heatmap of all incoming requests. The parameter (1) is the precision of latency calculation, such as in above case, 113ms and 193ms are considered same in the 101-200ms group. The parameter (2) is the group amount. In above case, 21(param value + 1) groups are 0-100ms, 101-200ms, ... 1901-2000ms, 2000+ms
apdex. Read Apdex in WIKI
service_apdex = from(Service.latency).apdex(name, status);
In this case, apdex score of each service. The parameter (1) is the service name, which effects the Apdex threshold value loaded from service-apdex-threshold.yml in the config folder. The parameter (2) is the status of this request. The status(success/failure) effects the Apdex calculation.
p50. Read percentile in WIKI
all_percentile = from(All.latency).percentile(10);
percentile is the first multiple value metrics, introduced since 7.0.0. As having multiple values, it could be query through
getMultipleLinearIntValues GraphQL query. In this case,
p50 of all incoming request. The parameter is the precision of p99 latency calculation, such as in above case, 120ms and 124 are considered same. Before 7.0.0, use
p50 func(s) to calculate metrics separately. Still supported in 7.x, but don‘t be recommended, and don’t be included in official OAL script.
all_p99 = from(All.latency).p99(10);
In this case, p99 value of all incoming requests. The parameter is the precision of p99 latency calculation, such as in above case, 120ms and 124 are considered same.
The metrics name for storage implementor, alarm and query modules. The type inference supported by core.
All metrics data will be grouped by Scope.ID and min-level TimeBucket.
Endpointscope, the Scope.ID = Endpoint id (the unique id based on service and its Endpoint)
Disable is an advanced statement in OAL, which is only used in certain case. Some of the aggregation and metrics are defined through core hard codes, this
disable statement is designed for make them de-active, such as
top_n_database_statement. In default, no one is being disable.
// Caculate p99 of both Endpoint1 and Endpoint2 endpoint_p99 = from(Endpoint.latency).filter(name in ("Endpoint1", "Endpoint2")).summary(0.99) // Caculate p99 of Endpoint name started with `serv` serv_Endpoint_p99 = from(Endpoint.latency).filter(name like "serv%").summary(0.99) // Caculate the avg response time of each Endpoint endpoint_avg = from(Endpoint.latency).avg() // Caculate the p50, p75, p90, p95 and p99 of each Endpoint by 50 ms steps. endpoint_percentile = from(Endpoint.latency).percentile(10) // Caculate the percent of response status is true, for each service. endpoint_success = from(Endpoint.*).filter(status == true).percent() // Caculate the sum of response code in [404, 500, 503], for each service. endpoint_abnormal = from(Endpoint.*).filter(responseCode in [404, 500, 503]).count() // Caculate the sum of request type in [RequestType.PRC, RequestType.gRPC], for each service. endpoint_rpc_calls_sum = from(Endpoint.*).filter(type in [RequestType.PRC, RequestType.gRPC]).count() // Caculate the sum of endpoint name in ["/v1", "/v2"], for each service. endpoint_url_sum = from(Endpoint.*).filter(name in ["/v1", "/v2"]).count() // Caculate the sum of calls for each service. endpoint_calls = from(Endpoint.*).sum() disable(segment); disable(endpoint_relation_server_side); disable(top_n_database_statement);