feat(inspect): SWIP-14 Inspect surface + preflight#1
Merged
Conversation
Adds Studio's binding to SkyWalking's SWIP-14 Inspect API along with
the supporting infrastructure (preflight, server-time, MQE-target
resolution) and an end-to-end Inspect page on /inspect.
Why: SWIP-13's runtime-rule editor only answers "which rules are
loaded". The natural follow-up — "which metrics did that rule produce
and which entities are emitting values right now" — required
operators to drop into MQE by hand. SWIP-14 exposes the metric
catalog + entity enumeration on admin-server (port 17128); this
change wires it into Studio so operators can browse, pick an entity,
and plot the MQE series without leaving the UI.
Wire types (@vantage-studio/api-client):
- InspectClient (GET /inspect/metrics + /inspect/entities)
- MqeEntity / EntityRow / ExpressionResult mirror SkyWalking's
GraphQL Entity input + execExpression response, so the BFF
can paste mqeEntity straight into the mutation
- formatInspectDate / isInspectDate helpers for the step-specific
yyyy-MM-dd / yyyy-MM-dd HH / yyyy-MM-dd HHmm shapes OAP parses
- 34 round-trip tests against the e2e fixtures
BFF (apps/bff/src/):
- oap/inspect-routes.ts — /api/inspect/{metrics,catalog,entities,
mqe-target,server-time,exec}; all gated on a new `inspect:read`
verb; 404s on /inspect/* promote to `inspect_not_enabled`
- oap/inspect-exec.ts — POST /api/inspect/exec proxies
`query execExpression` against the resolved MQE base (corrected
from the initial `mutation` — execExpression lives on Query in
metrics-v3.graphqls)
- oap/mqe-target.ts — resolves the GraphQL base via
/debugging/config/dump: sharing-server.restPort → core.restPort,
host falls back to the admin URL when the bound host is 0.0.0.0.
studio.yaml `oap.mqe.{host,port}` (both independently optional)
overrides either piece — k8s setups where admin and REST land on
different hostnames.
- oap/server-time.ts — caches `getTimeInfo` from OAP's GraphQL.
The SPA uses the offset to render dates in browser-local time
while sending server-TZ strings on the wire. Accepts both the
legacy integer (`800`) and current string (`"+0800"`/`"-05:00"`)
timezone shapes; falls back to BFF local clock when OAP is
unreachable. AbortController honours oap.timeoutMs.
- oap/preflight.ts + preflight-routes.ts — /api/preflight reads
/debugging/config/dump and reports which of Studio's four
required selectors (admin-server, receiver-runtime-rule,
dsl-debugging, inspect) are loaded.
- inspect/attribution.ts + parser-oal.ts + parser-mal.ts —
/api/inspect/catalog joins /inspect/metrics with a metric-name →
{source,file} index built from /runtime/oal/files +
/runtime/rule/list + /runtime/rule/bundled per MAL catalog. Best
effort per side: when SW_RECEIVER_RUNTIME_RULE is off, the
attribution gracefully degrades to source: "unknown" rather than
failing the whole catalog merge.
- config/schema.ts — `oap.mqe.{host,port}` schema additions.
SPA (apps/ui/src/views/Inspect.vue + supporting):
- New /inspect route with widget board (1 / 3 / 5 per row,
default 3) and per-card chart toggle (line / bar / area).
- Catalog drawer (two-pane): file tree on the left, alphabetical
metric list on the right, "+ all" per-file shortcut, "select
all visible" / "clear" breadcrumb actions.
- Per-widget entity editor: scope-aware form fields (not JSON
paste — the metric's scope is fixed by the catalog, only the
relevant name fields are exposed), multi-select over resolved
entities, custom-entity add for hand-built MQE Entities.
- Preset row: last 10m (default, step MINUTE) / 5h (HOUR) / 2d
(DAY); preset selection sets step + range together.
- Bucket-count guardrail: OAP's DurationUtils caps at 500 buckets
per query, so widgets refuse to fire when start/end/step would
exceed that — operator gets an actionable message instead of a
502.
- Browser-local date inputs; the BFF gets server-TZ strings via
formatForServer(date, step, offsetMinutes). Mirrors the
skywalking-booster-ui pattern but at minute precision.
- localStorage persistence of the board layout (widgets, selected
entities, custom entities, chart kind, density, preset). Reset
button clears widgets + storage. Hydration uses watchEffect so
a vue-query-cached catalog on re-entry still triggers the
restore.
- Cluster status (the landing page) grows a "Required modules"
table fed by /api/preflight. Replaces the header chip + modal
drafts.
- Catalog.vue and OalCatalog.vue get explicit refresh buttons.
- SPA-side BffClient methods + InspectCatalogResponse /
InspectServerTimeResponse / PreflightResponse types.
Config / docs:
- docs/inspect.md — operator guide.
- docs/install.md — lists all four required SW_* selectors with a
"what breaks if missing" table; aligns the demo image tag with
apache/skywalking-oap-server:admin-server.
- docs/configure.md — `oap.mqe.*` schema fields, `inspect:read`
in the verb table.
- docs/auth.md — `inspect:read` added to the verb table + role
examples.
Build / CI:
- prettier format pass on every new file + a handful of touched
pre-existing ones (CI runs format:check before lint).
- 117 BFF + 63 UI + 34 api-client tests green; lint clean;
typecheck clean across all three workspaces.
…/data ownership + demo selectors
Four discrete fixes surfaced by the post-implementation review pass:
1. `mode=revertToBundled` on /api/rule/delete is a structural change
(storage-identity flip — same write-class /api/rule already gates
on `rule:write:structural` for `allowStorageChange=true` /
`force=true`). The handler used to pick `rule:delete` for every
mode, so a caller with only `rule:delete` could call the revert
path directly and have it logged as `rule:delete` in audit. Now
the handler picks `rule:write:structural` when mode is
revertToBundled and `rule:delete` for the default mode; the audit
record carries the actually-checked verb. Added a regression test
that asserts a `reader` role with rule:delete (but no structural)
gets 403.
2. wire/fetch.ts used to `await cloned.text()` on every response —
reading the full body before truncating to maxBodyChars. A
multi-MB /api/dump response would buffer entirely in BFF memory
on every call. Replaced with a streaming reader that:
- skips body capture for known-streamy content types
(application/x-yaml, application/octet-stream, multipart/...,
compression formats);
- returns a "<N-byte response — capped>" marker when the
upstream Content-Length is already past max*4;
- reads at most max*4 bytes from the cloned body and aborts
early via reader.cancel().
The caller's response stream is untouched.
3. /api/inspect/server-time GraphQL fetch now honours
oap.timeoutMs via AbortController, matching every other
OAP-bound BFF call. Previously a hung getTimeInfo could leak
indefinitely.
4. Dockerfile pre-creates /data with nonroot (65532:65532)
ownership in the builder stage and `COPY --from=builder
/seed-data /data` into the runtime image. Docker propagates the
image's directory ownership to a named volume on first mount, so
the BFF can write to studio.yaml / audit.jsonl without an
operator-side chown. The `.keep` file inside is required —
Docker skips entirely empty directories during the seed.
5. deploy/docker/docker-compose.yml's `oap` service was missing
three of the four selectors install.md documents as required.
Added SW_ADMIN_SERVER, SW_DSL_DEBUGGING, SW_INSPECT alongside the
existing SW_RECEIVER_RUNTIME_RULE. The image tag is also aligned
with install.md (`:admin-server`).
Fix 4 (server-time timeout) lands in the inspect commit above
because it's strictly internal to the new feature.
Tests: 140 BFF + 63 UI + 34 api-client all green; format:check
clean; eslint clean; typecheck clean.
The "live · 5s" indicator on cluster status was hardcoded — an operator who wanted a tighter pulse during a flap or a looser one during a quiet period had to live with it. Adds a small dropdown next to the indicator with off / 5s / 15s / 60s options. Selection is persisted per-browser (localStorage `vs:cluster:poll:v1`); falls back to 5s when no preference exists. Wiring uses vue-query's function-form `refetchInterval`, so the dropdown takes effect on the next tick without remounting the queries. Applies to both cluster-state and dsl-debugging-status panes (they share a cadence by design — both are "what's happening right now" views). Preflight is unchanged — that one stays at 30s because its underlying data (which OAP modules are loaded) only flips on a restart. When `off` is selected the indicator switches from "live · 5s" to "manual" with a dimmed dot, and the "refresh now" button is still the explicit escape hatch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end binding to SkyWalking's SWIP-14 Inspect API plus the supporting infrastructure (preflight, server-time, MQE-target resolution, source attribution) and a new
/inspectpage that lets operators browse the metric catalog, pick the entity that holds values, and plot the MQE series — all without leaving Studio.Two commits:
feat(inspect): SWIP-14 Inspect surface + preflight + server-TZ handling— the feature.fix(review): RBAC gate on revertToBundled; wire-log body cap; docker /data ownership + demo selectors— the post-implementation review-pass findings.What landed
@vantage-studio/api-clientInspectClient(GET /inspect/metrics,GET /inspect/entities) + the wire types —MqeEntity/EntityRow/ExpressionResultmirror SkyWalking's GraphQLEntityinput andexecExpressionresponse so the BFF can pastemqeEntitystraight into the query.formatInspectDate/isInspectDatehelpers for the three step-specific formats (yyyy-MM-dd/yyyy-MM-dd HH/yyyy-MM-dd HHmm).BFF (
apps/bff/src/)GET /api/inspect/metrics/inspect/metrics.GET /api/inspect/entitiesGET /api/inspect/catalogGET /api/inspect/mqe-target/debugging/config/dump.studio.yaml'soap.mqe.{host,port}(each optional) overrides the discovered values — covers k8s setups where admin and REST land on different ingresses.POST /api/inspect/execquery execExpressionagainst the resolved base.GET /api/inspect/server-timegetTimeInfo. SPA uses the offset to display browser-local dates while sending server-TZ strings on the wire.GET /api/preflightinspect:readis the new RBAC verb gating every/api/inspect/*route.SPA (
/inspect)+ allper-file shortcut,select all visible/clearbreadcrumb actions./inspect/metrics, only the relevant name fields are exposed), multi-select over resolved entities, custom-entity add.last 10m(default, MINUTE) /last 5h(HOUR) /last 2d(DAY) — preset sets step + range together.DurationUtils.MAX_TIME_RANGE = 500cap: widget refuses to fire and shows an actionable message instead of letting OAP 502.localStoragepersistence of the board layout (widgets, selected entities, custom entities, chart kind, density, preset). Reset button clears widgets + storage.REQUIRED MODULEStable fed by/api/preflight— actionable "OAP-side selectors Studio needs" instead of a header chip + modal.Catalog.vueandOalCatalog.vuegrow explicit refresh buttons.Docs
docs/inspect.md— operator guide.docs/install.md— lists all four requiredSW_*selectors with a "what breaks if missing" table; aligns the demo image tag withapache/skywalking-oap-server:admin-server.docs/configure.md—oap.mqe.*schema fields;inspect:readin the verb table.docs/auth.md—inspect:readin the verb table + role examples.Review-pass fixes (second commit)
revertToBundlednow requiresrule:write:structural(wasrule:delete). Audit log carries the actually-checked verb. New regression test asserts a role with onlyrule:deletegets 403.apps/bff/src/oap/routes.tsoapservice was missing three of the four required selectors. AddedSW_ADMIN_SERVER,SW_DSL_DEBUGGING,SW_INSPECT; aligned the image tag withinstall.md.deploy/docker/docker-compose.ymlpnpm format:checkwas failing on the new files. Ran prettier across the tree./api/inspect/server-timeGraphQL fetch now honoursoap.timeoutMsviaAbortController(matches the rest of the OAP-bound calls).apps/bff/src/oap/server-time.tswire/fetch.tsno longer buffers the entire response body before truncating: streamy content types are skipped,Content-Length > max*4returns a header-only marker, and the cloned reader bails aftermax*4bytes viareader.cancel().apps/bff/src/wire/fetch.ts/datawith nonroot (65532:65532) ownership in the builder stage and copies it into the runtime image. Docker propagates that to the named volume on first mount, so the BFF can seedstudio.yaml/audit.jsonlwithout an operator-side chown.deploy/docker/DockerfileTest plan
pnpm lintpnpm format:checkpnpm -F @vantage-studio/{ui,bff,api-client} typecheckpnpm test— 140 BFF + 63 UI + 34 api-client tests green.pnpm -F @vantage-studio/ui buildpnpm -F @vantage-studio/bff buildSW_ADMIN_SERVER=default+ the three SWIP-13/14 selectors):/inspectresolves 1746 metrics with 477 attributed to OAL + 1150 to MAL·OTEL + 4 to LAL→MAL + 115 unknown;service_cpmfore2e-service-providerreturns a non-emptyTIME_SERIES_VALUES; preflight reports all four modules enabled.docker compose upbrings up Studio + OAP + BanyanDB with every required selector wired (smoke against the updated compose file).