Status: shipped. Operator reference for the MAL slice of the DSL Debug API. Design: SWIP-13. Index of related pages: DSL Debug API overview.
A MAL session attaches to one metric rule. Every scrape window that survives the rule's file-level filter produces one record in the response; within a record, each probe stage the expression executes appends one sample. The wire shape is:
nodes[] records[] startedAtMs — record boundary timestamp (ms) dsl — verbatim per-rule DSL text rule — rule envelope: metricPrefix name — per-rule name (no prefix) filter — file-level filter closure body, if any exp — `exp:` body verbatim expSuffix — file-level expSuffix verbatim, if any samples[] type — input | filter | function | output sourceText — verbatim DSL fragment for this probe continueOn — true (MAL captures kept-only; see overview) payload — SampleFamily.toJson() at this probe stage sourceLine — omitted for MAL (no per-line mapping)
Sample types and the probes that emit them:
type | Probe | Fired when |
|---|---|---|
filter | captureFilter | The file-level filter: closure runs over the input samples (kept-only). |
input | captureInput | The metric reference at the head of the expression resolves a SampleFamily. |
function | captureStage | An in-expression chain op runs (sum, tagEqual, service, etc.). |
function | captureDownsample | A downsampling op runs (e.g. rate("PT1M")). |
output | captureMeterEmit | The metric is emitted to the persistence pipeline (terminal). |
sample.sourceText is the verbatim ANTLR slice of the chain segment from the original exp: body — operators can grep the captured text against the source byte-for-byte. There is no leading . (the dot is part of the chain context, not the MethodCallContext slice).
sample.payload is the structured SampleFamily.toJson() at that probe stage — every sample's name, label set, value, and timestamp is present; truncated at maxSamplesPerCapture (default 64) with a +N more summary.
When no session is bound, the codegen-emitted probe call sites are single volatile-bool reads that JIT eliminates after warm-up — idle cost is effectively free.
The shared admin HTTP host (admin-server) is enabled by default; turn on the DSL-debug feature on top of it:
SW_DSL_DEBUGGING=default
injectionEnabled is a boot-time codegen switch, default true once the dsl-debugging module is enabled — the MAL generator emits per-rule GateHolder fields and probe call sites, so debug sessions actually capture samples. Set false only if the REST surface is wanted but no codegen-side probe overhead is acceptable; with false the MAL bytecode is byte-identical to a build without SWIP-13, and POST /dsl-debugging/session returns 503 injection_disabled. Flipping the flag requires an OAP restart:
SW_DSL_DEBUGGING_INJECTION_ENABLED=false # default is true; set false to disable probes
SECURITY: capture payloads include MAL builder state and sample-family contents. Treat the admin port as authenticated infrastructure — see Admin API readme — Security Notice.
A session targets one MAL metric rule. The key tuple is (catalog, name, ruleName):
| Field | Source |
|---|---|
catalog | One of otel-rules, log-mal-rules, telegraf-rules — the directory the rule file lives in |
name | The rule file name, without .yaml |
ruleName | The full metric name (metricPrefix + _ + per-rule name) |
Example — the shipped otel-rules/vm.yaml declares a metric prefix vm and per-rule name cpu_total_percentage. The full metric name is vm_cpu_total_percentage. The session install call:
POST /dsl-debugging/session?catalog=otel-rules&name=vm&ruleName=vm_cpu_total_percentage
To list the metrics a runtime-rule MAL file exposes, query GET /runtime/rule/list and pull the ruleNames associated with the catalog/name pair (the runtime-rule receiver records every rule's metric catalog).
The example uses a runtime-rule-applied MAL rule with a top-level filter clause so all probe stages (filter → input → function → output) appear in the captures.
# /tmp/mal-with-filter.yaml filter: "{ tags -> tags.service_name == 'my-svc' }" metricPrefix: e2e_demo expSuffix: service(['service_name'], Layer.GENERAL) metricsRules: - name: filtered_requests exp: e2e_demo_request_count_total.sum(['service_name'])
curl -s -X POST -H 'Content-Type: text/plain' \ --data-binary '@/tmp/mal-with-filter.yaml' \ 'http://OAP:17128/runtime/rule/addOrUpdate?catalog=otel-rules&name=mal-with-filter'
curl -s -X POST \ 'http://OAP:17128/dsl-debugging/session?catalog=otel-rules&name=mal-with-filter&ruleName=e2e_demo_filtered_requests&clientId=alice'
curl -s 'http://OAP:17128/dsl-debugging/session/SESSION_ID'
A trimmed slice (one record = one scrape window):
{ "sessionId": "76b3266a-...", "capturedAt": 1777967923700, "ruleKey": { "catalog": "otel-rules", "name": "mal-with-filter", "ruleName": "e2e_demo_filtered_requests" }, "nodes": [{ "nodeId": "0.0.0.0_11800", "status": "ok", "records": [{ "startedAtMs": 1777967921000, "dsl": "(e2e_demo_request_count_total.sum(['service_name'])).service(['service_name'], Layer.GENERAL)", "rule": { "metricPrefix": "e2e_demo", "name": "filtered_requests", "filter": "{ tags -> tags.service_name == 'my-svc' }", "exp": "e2e_demo_request_count_total.sum(['service_name'])", "expSuffix": "service(['service_name'], Layer.GENERAL)" }, "samples": [ { "type": "filter", "sourceText": "{ tags -> tags.service_name == 'my-svc' }", "continueOn": true, "payload": { "families": 1, "items": [ /* one entry per surviving SampleFamily — name, samples count, items[] */ ] } }, { "type": "input", "sourceText": "e2e_demo_request_count_total", "continueOn": true, "payload": { /* head SampleFamily — name, samples, items[] */ } }, { "type": "function", "sourceText": "sum(['service_name'])", "continueOn": true, "payload": { /* SampleFamily after sum */ } }, { "type": "output", "sourceText": "e2e_demo_filtered_requests", "continueOn": true, "payload": { "metric": "e2e_demo_filtered_requests", "entity": "MeterEntity(scopeType=SERVICE, serviceName=my-svc, …)", "valueType": "sum", "timeBucket": 202605091036, "value": 42 /* shape depends on valueType: number for Sum/Avg/Max/Min/CPM/Latest…, object {bucket: count} for histograms / *Labeled functions, omitted for non-scalar holders. NaN/±Infinity render as strings. */ } } ] }] }] }
sample.sourceText is the verbatim ANTLR slice — match it against the exp: body byte-for-byte. The record-level rule envelope echoes the structured rule config so operators don't have to re-resolve the file.
curl -s -X POST 'http://OAP:17128/dsl-debugging/session/SESSION_ID/stop'
nodes[]; unreachable peers appear as status: "unreachable" rather than being omitted.No cross-node merge — each slice is self-contained.
| Response | Meaning |
|---|---|
400 invalid_catalog | The wire catalog is not one of the MAL catalogs. |
400 missing_param | name or ruleName is missing. |
404 rule_not_found | No live MAL artifact for the tuple on this node — rule never loaded, was inactivated, or this node hasn't compiled it yet. |
503 injection_disabled | injectionEnabled=false. Restart with the flag on to debug. |
500 registry_misconfigured | A recorder factory wiring bug — file an issue. |
| Field | Default | Hard cap | Purpose |
|---|---|---|---|
recordCap | 100 | 100 | Max records before the recorder marks itself captured and refuses appends. |
retentionMillis | 300000 (5m) | 3600000 (1h) | Wall-clock retention; the session is reaped after the deadline whether or not it was explicitly stopped. |
Out-of-range values return 400 invalid_limits from POST /dsl-debugging/session. Override per-session (within the caps above) in the request body:
{ "recordCap": 50, "retentionMillis": 600000 }