| # DSL Debug API — OAL |
| |
| > Status: **shipped**. Operator reference for the OAL slice of the DSL Debug |
| > API. Design: [SWIP-13](../../../swip/SWIP-13.md). Index of related pages: |
| > [DSL Debug API overview](dsl-debugging.md). |
| |
| ## What it captures |
| |
| OAL's gate is **per-metric**: each generated metric (e.g. |
| `service_relation_server_cpm`) has its own gate holder, and a debug |
| session attaches to one metric. Sibling rules under the same source |
| dispatcher stay silent. |
| |
| A session against `(catalog=oal, name=<file>, ruleName=<metric>)` |
| captures every source event walked through that one metric's pipeline. |
| Every event produces one **record**; within a record, each probe stage |
| appends one **sample**: |
| |
| ```text |
| nodes[] |
| records[] |
| startedAtMs — record boundary timestamp (ms) |
| dsl — verbatim per-metric `.oal` line |
| rule — { ruleName, sourceLine } |
| samples[] |
| type — input | filter | aggregation | output |
| sourceText — verbatim ANTLR slice from the `.oal` file |
| continueOn — true = pipeline continued past this step |
| payload — { type, scope, fields | timeBucket | … } |
| sourceLine — 1-based line in the source `.oal` file |
| ``` |
| |
| Sample types and the probes that emit them: |
| |
| | `type` | Probe | Fired when | |
| |---------------|----------------------|----------------------------------------------------------------------------------------------------| |
| | `input` | `captureSource` | A source event arrives at the metric's pipeline. Payload = `ISource.toJson()` with all source columns. | |
| | `filter` | `captureFilter` | An OAL `.filter(...)` clause runs. **Both** kept (`continueOn=true`) **and** rejected (`continueOn=false`) branches are captured. | |
| | `aggregation` | `captureAggregation` | The metric's aggregation function runs (`cpm()`, `percentile2(10)`, ...). Carries the post-aggregation source view. | |
| | `output` | `captureEmit` | The metric is emitted to the persistence pipeline (terminal). Payload = `Metrics.toJson()` with `count` / `total` / `value` / `timeBucket` etc. | |
| |
| ### sourceText is the verbatim ANTLR slice |
| |
| Pulled at `.oal` parse time via |
| `ctx.getStart().getInputStream().getText(Interval.of(...))`. Whitespace |
| and identifier spelling are byte-identical to the source file: |
| |
| - `input.sourceText` — the source clause (e.g. `from(ServiceRelation.*)`). |
| - `filter.sourceText` — the filter clause **including** the leading `.` |
| (the dot is part of the slice, e.g. `.filter(detectPoint == DetectPoint.SERVER)`). |
| - `aggregation.sourceText` — the aggregate function call |
| (e.g. `cpm()` / `percentile2(10)`). |
| - `output.sourceText` — the metric name. |
| |
| Operators can grep the captured `sourceText` against the original `.oal` |
| file directly. |
| |
| ### Both kept and rejected filter branches |
| |
| Unlike MAL (where rejected executions are dropped to avoid tag-cardinality |
| noise), OAL captures both filter branches. OAL filters are deterministic |
| discriminators — CLIENT-vs-SERVER, layer matchers, status / latency |
| predicates — and seeing the rejected source samples (`continueOn=false`) |
| is the filter doing its job in plain view, useful for verifying partition |
| logic. |
| |
| When no session is bound, the codegen-emitted probe call sites are single |
| volatile-bool reads — idle cost is effectively free. |
| |
| ## Enabling |
| |
| The shared admin HTTP host (`admin-server`) is enabled by default; turn on the |
| DSL-debug feature on top of it: |
| |
| ```bash |
| SW_DSL_DEBUGGING=default |
| ``` |
| |
| `injectionEnabled` is a **boot-time codegen switch**, default `true` once the |
| `dsl-debugging` module is enabled — the OAL dispatcher template emits |
| per-metric `GateHolder` fields and probe call sites, so debug sessions |
| capture samples. Set `false` only if the REST surface is wanted but no |
| codegen-side probe overhead is acceptable; with `false` the OAL bytecode is |
| byte-identical to a build without SWIP-13. Flipping the flag requires an |
| OAP restart: |
| |
| ```bash |
| SW_DSL_DEBUGGING_INJECTION_ENABLED=false # default is true; set false to disable probes |
| ``` |
| |
| > SECURITY: capture payloads include source-event contents (service names, |
| > endpoint names, span attributes). Treat the admin port as authenticated |
| > infrastructure — see |
| > [Admin API readme — Security Notice](readme.md#-security-notice). |
| |
| ## Picking the rule key |
| |
| A session targets one OAL **metric**. The key tuple is |
| `(catalog=oal, name=<file>, ruleName=<metric>)`: |
| |
| | Field | Source | |
| |------------|-----------------------------------------------------------------------| |
| | `catalog` | `oal` | |
| | `name` | The `.oal` file the metric is declared in (e.g. `core.oal`). | |
| | `ruleName` | The metric name on the LHS of the `=` (e.g. `service_relation_server_cpm`). | |
| |
| To list the metrics loaded on a node, query |
| `GET /runtime/oal/files` — each file's response lists its metrics. The |
| same endpoint also exposes the rules registered against each source — |
| useful when picking a metric whose filter clauses you want to inspect. |
| |
| ## End-to-end example |
| |
| The shipped `core.oal` declares `service_relation_server_cpm` against |
| `ServiceRelation` with a SERVER detect-point filter: |
| |
| ``` |
| service_relation_server_cpm = from(ServiceRelation.*) |
| .filter(detectPoint == DetectPoint.SERVER).cpm(); |
| ``` |
| |
| ### 1. Open a debug session |
| |
| ```bash |
| curl -s -X POST \ |
| 'http://OAP:17128/dsl-debugging/session?catalog=oal&name=core.oal&ruleName=service_relation_server_cpm&clientId=alice' |
| ``` |
| |
| ### 2. Drive ingest |
| |
| Send any agent traffic that produces inter-service spans (HTTP between |
| two SkyWalking-instrumented services, gRPC, etc.). The dispatcher fires |
| the metric's pipeline on every `ServiceRelation` source event. |
| |
| ### 3. Poll |
| |
| ```bash |
| curl -s 'http://OAP:17128/dsl-debugging/session/SESSION_ID' |
| ``` |
| |
| A trimmed slice (one record = one source event): |
| |
| ```json |
| { |
| "ruleKey": { "catalog": "oal", "name": "core.oal", |
| "ruleName": "service_relation_server_cpm" }, |
| "nodes": [{ |
| "nodeId": "0.0.0.0_11800", |
| "status": "ok", |
| "records": [{ |
| "startedAtMs": 1778115085149, |
| "dsl": "service_relation_server_cpm = from(ServiceRelation.*).filter(detectPoint == DetectPoint.SERVER).cpm();", |
| "rule": { "ruleName": "service_relation_server_cpm", "sourceLine": "30" }, |
| "samples": [ |
| { "type": "input", |
| "sourceText": "from(ServiceRelation.*)", |
| "continueOn": true, |
| "payload": { |
| "type": "ServiceRelation", "scope": 4, |
| "fields": { |
| "sourceServiceName": "e2e-service-consumer", |
| "destServiceName": "e2e-service-provider", |
| "detectPoint": "SERVER", |
| "endpoint": "POST:/users", |
| "componentId": 1, "latency": 962, "status": true, |
| "httpResponseStatusCode": 200, |
| "timeBucket": 202605070051 |
| } |
| }, |
| "sourceLine": 30 }, |
| { "type": "filter", |
| "sourceText": ".filter(detectPoint == DetectPoint.SERVER)", |
| "continueOn": true, |
| "payload": { "type": "ServiceRelation", "fields": { /* same row */ } }, |
| "sourceLine": 30 }, |
| { "type": "aggregation", |
| "sourceText": "cpm()", |
| "continueOn": true, |
| "payload": { "type": "ServiceRelationServerCpmMetrics", |
| "timeBucket": 202605070051, |
| "count": 1, "total": 1, "value": 1 }, |
| "sourceLine": 30 } |
| ] |
| }] |
| }] |
| } |
| ``` |
| |
| A single source event commonly produces a rejected `filter` sample on |
| sibling rules (e.g. the CLIENT-detect-point sibling). The rejected |
| sample shows `continueOn=false` and no `aggregation`/`output` follows — |
| the metric's pipeline stopped at the filter. |
| |
| ### 4. Stop |
| |
| ```bash |
| curl -s -X POST 'http://OAP:17128/dsl-debugging/session/SESSION_ID/stop' |
| ``` |
| |
| ## Cluster behaviour |
| |
| - **Install** broadcasts to every reachable peer; each peer attaches its own |
| recorder to its dispatcher. |
| - **Collect** broadcasts and concatenates per-node slices. |
| - **Stop** broadcasts; missed acks fall out via retention timeout. |
| |
| No cross-node merge — each peer's slice is self-contained. |
| |
| ## Failure modes |
| |
| | Response | Meaning | |
| |------------------------------|---------------------------------------------------------------------------| |
| | `400 invalid_catalog` | Catalog must be `oal`. | |
| | `400 missing_param` | `name` or `ruleName` is missing. | |
| | `404 rule_not_found` | No metric for `(name, ruleName)` on this node — typo, or no `.oal` rule loaded. | |
| | `503 injection_disabled` | `injectionEnabled=false`. Restart with the flag on to debug. | |
| |
| ## Limits |
| |
| | Field | Default | Hard cap | Purpose | |
| |-------------------|---------------|-----------------|--------------------------------------------------------| |
| | `recordCap` | `100` | `100` | Max records before the recorder refuses appends. | |
| | `retentionMillis` | `300000` (5m) | `3600000` (1h) | Wall-clock retention. | |
| |
| Out-of-range values return `400 invalid_limits`. Override per-session (within |
| the caps above) in the install body: |
| |
| ```json |
| { "recordCap": 50, "retentionMillis": 600000 } |
| ``` |
| |
| ## See also |
| |
| - [DSL Debug API — MAL](dsl-debugging-mal.md) |
| - [DSL Debug API — LAL](dsl-debugging-lal.md) |
| - [SWIP-13](../../../swip/SWIP-13.md) — full design. |