Skip to content

fix(viewer): compare-mode bridge KPIs render both arms in section + Notebook#910

Merged
thinkingfish merged 7 commits into
iopsystems:mainfrom
thinkingfish:fix/compare-multidim-percentile-labels
May 11, 2026
Merged

fix(viewer): compare-mode bridge KPIs render both arms in section + Notebook#910
thinkingfish merged 7 commits into
iopsystems:mainfrom
thinkingfish:fix/compare-multidim-percentile-labels

Conversation

@thinkingfish

Copy link
Copy Markdown
Member

Summary

Three related bugs along the compare-mode rendering path for category-bridge KPIs (e.g. `inference-library` TTFT/Prompt Token Rate, where the dashboard generates a vllm-side query and a sglang-side query for one chart). Hit at the same time when pinning a bridge chart into the Notebook.

1. Asymmetric label extraction for percentile (scatter) charts

Baseline read `spec.series_names` (string list, lossy on multi-dim metrics — only the first non-`name` label survives) then ran `canonicalQuantileLabel` over the string. Experiment read `item.metric` directly via `canonicalQuantileLabel(item)` which preferred `metric.quantile` and dropped extra dims. The two sides produced disjoint match keys → "compare: no shared labels between captures."

Fix: `composeScatterLabel(metric, options)` in `compare_math.js` — multi-dim aware, with an `excludeValues` set (the category bridge) that drops dims whose values are capture-identity names. `applyResultToPlot` now also stashes `spec.series_metrics` so the baseline path has the raw metric objects available. Both extracts forward `options` symmetrically. Falls back to `canonicalQuantileLabel` for malformed metrics.

2. Pin loses bridge metadata

Pinning a bridge chart kept only `spec.promql_query`. Re-rendering in the Notebook ran the baseline query on the experiment side too, returning empty.

Fix: entries gain two optional fields, `promql_query_experiment` and `category_members`, captured at pin time from the spec + Group's `sectionMetadata`. Round-tripped through localStorage persist/restore and JSON export/import, defaulting to null on older payloads. `chartLoaderMixin` restores `promql_query_experiment` onto the rebuilt spec; NotebookView passes `category_members` to `CompareChartWrapper` as `categoryMembers`.

3. Pinned charts node-inject the wrong host

NotebookView mounted `CompareChartWrapper` with `sectionRoute: '/notebook'`. `buildEffectiveQuery` skips node-label injection for `/service/*` routes (service KPIs are scoped by `source=...`, not node), but `/notebook` doesn't qualify, so the experiment-side fetch ended up with the baseline's hostname injected into the bridge query — zero matches against the experiment recording, server returned "Metric not found". Only the baseline trace rendered.

Fix: use `entry.section` (the chart's home route) instead of `/notebook`. Bridge KPIs now skip node injection symmetrically with their original section view. Non-bridge pinned charts also keep their original section context — slightly different from before but more consistent.

Plumbing

`renderServiceSection` forwards section-level metadata as `sectionMetadata: meta` (a new prop, not `metadata` which is reserved for per-group config). Group reads `attrs.sectionMetadata?.category_members`, threads it through chartBody → CompareChartWrapper → both extract* functions → composeScatterLabel.

Test plan

  • cargo build clean, viewer smoke 9/9
  • node tests 47/48 (the 1 failure is the pre-existing wasm comparison test)
  • 15 new unit tests in compare_math.test.mjs covering composeScatterLabel: pure quantile, quantile + N extra dims (alpha-sorted), no-quantile fallback, malformed input, excludeValues option (drop matched, preserve non-matched, empty/missing → no-op), baseline=experiment symmetry assertion
  • Manual: `view vllm=vllm.parquet sglang=sglang.parquet --category inference-library` — TTFT and other bridge KPIs render both arms in the inference-library section
  • Manual: pin a bridge KPI from the section, navigate to Notebook — both arms render (was: only baseline; experiment query had `node=""` injected and returned empty)

Known limitations

  • The `composeScatterLabel` excludeValues filter is by VALUE (drops any dim whose value matches a category-member name). Footgun: if some unrelated label happens to carry a value like `vllm` or `sglang` (e.g. `binary=vllm`), it gets dropped too. In practice this is fine — the category-member names are distinct enough.
  • Multi-dim percentile compare for non-bridge sections (e.g. per-cgroup percentile) is no longer broken (was the original sub-bug 1) but isn't smoke-tested end-to-end — the unit tests cover the helper and the section/group plumbing pattern is straightforward.

🤖 Generated with Claude Code

thinkingfish and others added 7 commits May 10, 2026 23:09
Compare mode failed with "compare: no shared labels between
captures" on percentile (scatter) charts whose series carry extra
dims beyond `quantile`. Two distinct sub-bugs converged here:

(1) Asymmetric label extraction:

  Baseline (extractBaselineCapture): read spec.series_names —
    strings precomputed by applyResultToPlot as the FIRST
    non-`__name__` metric value (order-dependent; on a multi-dim
    metric it picks "vllm" or "0.5" depending on iteration order).
    For scatter, fell back through canonicalQuantileLabel which
    only matches pure quantile strings — multi-dim strings like
    "0.5,vllm" produce null and the original value stands.

  Experiment (extractExperimentCapture): read item.metric directly
    via canonicalQuantileLabel(item), which prefers metric.quantile
    and returns just "p50" — extra dims dropped entirely.

  Result: baseline keys looked like {"vllm","sglang"} (or
  {"0.5","0.95"}) and experiment keys looked like {"p50","p95"}.

(2) Capture-identity dims in match keys:

  In category-mode compare (e.g. inference-library, baseline=vllm
  vs experiment=sglang) the per-side queries produce series whose
  labels intentionally differ on `source=vllm` vs `source=sglang`.
  Even with symmetric extraction these dims must NOT contribute
  to the cross-capture match key — they distinguish the captures
  themselves, not subseries within each side.

Fix: a single `composeScatterLabel(metric, options)` helper that
canonicalizes the quantile dim to "pXX" and appends remaining dims
(alpha-sorted by key, value-only) joined with " · ", with an
`options.excludeValues` set that drops dims whose values match
known capture-identity values (the category-bridge plumbing).

Wiring:
- applyResultToPlot (data.js) now also stashes plot.series_metrics
  (parallel to series_names) so the baseline path has the raw
  metric objects available — series_names alone is too lossy.
- extractBaselineCapture / extractExperimentCapture both take an
  options arg and forward it to composeScatterLabel.
- CompareChartWrapper reads attrs.categoryMembers, builds an
  excludeValues set, and passes it to both extracts.
- Group reads attrs.sectionMetadata.category_members and forwards
  it as the categoryMembers attr on CompareChartWrapper.
- renderServiceSection (service.js) forwards section metadata as
  a NEW `sectionMetadata` prop on Group — not `metadata`, which
  is reserved for per-group config like no_collapse.

Pure-quantile metrics (no extra dims) round-trip identically to
the previous behavior — composeScatterLabel({quantile:"0.5"})
returns "p50". Non-category sections pass no categoryMembers, so
excludeValues stays empty and all dims are preserved (fixes the
per-cgroup percentile compare case from sub-bug 1). Category
sections pass member names, so source=vllm gets dropped (fixes
the inference-library TTFT case from sub-bug 2).

Tests: 15 new cases in compare_math.test.mjs cover pure quantile,
quantile + N extra dims, no-quantile fallback, malformed input,
the excludeValues option (dropping matched dims, preserving
non-matched dims, empty/missing → no-op), and the explicit
baseline=experiment symmetry assertion for both sub-bugs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a category-bridge KPI (one with `spec.promql_query_experiment`
set by the dashboard's category builder, e.g. inference-library
TTFT) gets pinned, the entry kept only the baseline query. Re-
rendering in compare mode then ran the baseline query on the
experiment side too — different metric name, no result — and the
user saw only the baseline trace ("Notebook only shows vllm on
the chart").

Two new optional fields on each pinned entry:
- `promql_query_experiment` — the per-side experiment query the
  category builder generates. Restored onto the rebuilt spec in
  chartLoaderMixin so CompareChartWrapper's existing
  `spec.promql_query_experiment || spec.promql_query` precedence
  picks the right query when fetching experiment data.
- `category_members` — the member-name set forwarded as
  `categoryMembers` to CompareChartWrapper, so composeScatterLabel
  drops the capture-identity dim from the cross-capture match key
  even though the live Notebook section has no category metadata.

Both fields are captured from the chart's section context at pin
time (selectButton gains a `categoryMembers` field on its
compareMeta arg, plumbed from Group's `attrs.sectionMetadata
.category_members`). They round-trip through localStorage persist
+ restore and through buildPayload (JSON export) + entriesFromPayload
(JSON load), defaulting to null on older payloads — pre-fix
exports degrade to single-query compare (the original broken
behavior, no worse than before the field existed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NotebookView mounted CompareChartWrapper with a hardcoded
sectionRoute: '/notebook' for every pinned chart. buildEffectiveQuery
gates topology-label injection on the sectionRoute prefix:
service-section routes (/service/*) skip node injection because
service KPIs are already scoped by source=... and don't carry a
node label.

Hardcoding /notebook bypassed that gate. The experiment-side fetch
for a pinned bridge KPI (e.g., inference-library TTFT) ended up
running:

  sglang_prompt_tokens_total{source="sglang", node="<baseline-host>"}

against the experiment capture. The baseline's hostname doesn't
exist in the experiment recording, so PromQL returned a
"Metric not found" error and only the baseline trace rendered.

Use entry.section (the section the chart was originally pinned
from) to reconstruct the sectionRoute. Bridge KPIs pinned from
/service/inference-library now route as /service/inference-library
in compare mode, skipping node injection symmetrically with the
original section view. Non-bridge pinned charts (CPU, network,
etc.) keep their original sectionRoute too — slightly different
behavior from before, but more consistent: a pinned chart now
renders with the same topology-injection rules it had in its
home section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commits over-explained — long docstrings restating
what the code does, history-of-the-bug exposition, and example
blocks that duplicated the test cases. Cut to one or two lines
per comment site that captures the WHY a future reader can't
recover from the code.

No behavior change. compare_math tests + viewer smoke still green.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pinned histogram charts in compare mode now render the same
Full / Tail / Diff (scatter) or diff (heatmap) toggles the
section view shows, by reusing chart_controls.compareToggle.
The state already round-trips through notebookStore.chartToggles
and persistNotebook — only the render was missing.

Wraps the chart title in a chart-title-row so the toggle group
sits on the right, matching the section-view chartHeader layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The schema already carries anchors / chartToggles / compare per
store, and Save as Report captures all of it in compare mode —
but the load-side views ignored it and rendered single-Chart only.

Both views now mirror NotebookView's compare-mode contract:
- chartBody picks CompareChartWrapper when attrs.compareMode and
  spec.promql_query are present
- chart-header renders compareToggle (Full / Tail / Diff)
- per-store chartToggles + anchors threaded into the wrapper
- bridge metadata (promql_query_experiment, category_members)
  on the entry restores the per-side query and the
  capture-identity exclusion set

Toggle writes go to the right store: factored out a generic
chartToggleSetter(store, persistFn) and bound three setters —
notebook (persists), report (persists), loaded-selection
(in-memory only, matches its store lifecycle).

app.js mounts now pass compareMode, experimentQueryRange, and
the alias pair through to both views.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Was hidden in compare mode under the (now-stale) assumption that
selections are single-capture. Previous commit made
LoadedSelectionView compare-mode capable — both arms render when
an experiment is attached. Drop the !compareMode gate so the
button is reachable from the compare-mode topnav.
@thinkingfish thinkingfish marked this pull request as ready for review May 11, 2026 08:05
@thinkingfish thinkingfish merged commit adadf55 into iopsystems:main May 11, 2026
21 checks passed
@thinkingfish thinkingfish deleted the fix/compare-multidim-percentile-labels branch May 11, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant