fix(viewer): compare-mode bridge KPIs render both arms in section + Notebook#910
Merged
thinkingfish merged 7 commits intoMay 11, 2026
Conversation
Compare mode failed with "compare: no shared labels between
captures" on percentile (scatter) charts whose series carry extra
dims beyond `quantile`. Two distinct sub-bugs converged here:
(1) Asymmetric label extraction:
Baseline (extractBaselineCapture): read spec.series_names —
strings precomputed by applyResultToPlot as the FIRST
non-`__name__` metric value (order-dependent; on a multi-dim
metric it picks "vllm" or "0.5" depending on iteration order).
For scatter, fell back through canonicalQuantileLabel which
only matches pure quantile strings — multi-dim strings like
"0.5,vllm" produce null and the original value stands.
Experiment (extractExperimentCapture): read item.metric directly
via canonicalQuantileLabel(item), which prefers metric.quantile
and returns just "p50" — extra dims dropped entirely.
Result: baseline keys looked like {"vllm","sglang"} (or
{"0.5","0.95"}) and experiment keys looked like {"p50","p95"}.
(2) Capture-identity dims in match keys:
In category-mode compare (e.g. inference-library, baseline=vllm
vs experiment=sglang) the per-side queries produce series whose
labels intentionally differ on `source=vllm` vs `source=sglang`.
Even with symmetric extraction these dims must NOT contribute
to the cross-capture match key — they distinguish the captures
themselves, not subseries within each side.
Fix: a single `composeScatterLabel(metric, options)` helper that
canonicalizes the quantile dim to "pXX" and appends remaining dims
(alpha-sorted by key, value-only) joined with " · ", with an
`options.excludeValues` set that drops dims whose values match
known capture-identity values (the category-bridge plumbing).
Wiring:
- applyResultToPlot (data.js) now also stashes plot.series_metrics
(parallel to series_names) so the baseline path has the raw
metric objects available — series_names alone is too lossy.
- extractBaselineCapture / extractExperimentCapture both take an
options arg and forward it to composeScatterLabel.
- CompareChartWrapper reads attrs.categoryMembers, builds an
excludeValues set, and passes it to both extracts.
- Group reads attrs.sectionMetadata.category_members and forwards
it as the categoryMembers attr on CompareChartWrapper.
- renderServiceSection (service.js) forwards section metadata as
a NEW `sectionMetadata` prop on Group — not `metadata`, which
is reserved for per-group config like no_collapse.
Pure-quantile metrics (no extra dims) round-trip identically to
the previous behavior — composeScatterLabel({quantile:"0.5"})
returns "p50". Non-category sections pass no categoryMembers, so
excludeValues stays empty and all dims are preserved (fixes the
per-cgroup percentile compare case from sub-bug 1). Category
sections pass member names, so source=vllm gets dropped (fixes
the inference-library TTFT case from sub-bug 2).
Tests: 15 new cases in compare_math.test.mjs cover pure quantile,
quantile + N extra dims, no-quantile fallback, malformed input,
the excludeValues option (dropping matched dims, preserving
non-matched dims, empty/missing → no-op), and the explicit
baseline=experiment symmetry assertion for both sub-bugs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a category-bridge KPI (one with `spec.promql_query_experiment`
set by the dashboard's category builder, e.g. inference-library
TTFT) gets pinned, the entry kept only the baseline query. Re-
rendering in compare mode then ran the baseline query on the
experiment side too — different metric name, no result — and the
user saw only the baseline trace ("Notebook only shows vllm on
the chart").
Two new optional fields on each pinned entry:
- `promql_query_experiment` — the per-side experiment query the
category builder generates. Restored onto the rebuilt spec in
chartLoaderMixin so CompareChartWrapper's existing
`spec.promql_query_experiment || spec.promql_query` precedence
picks the right query when fetching experiment data.
- `category_members` — the member-name set forwarded as
`categoryMembers` to CompareChartWrapper, so composeScatterLabel
drops the capture-identity dim from the cross-capture match key
even though the live Notebook section has no category metadata.
Both fields are captured from the chart's section context at pin
time (selectButton gains a `categoryMembers` field on its
compareMeta arg, plumbed from Group's `attrs.sectionMetadata
.category_members`). They round-trip through localStorage persist
+ restore and through buildPayload (JSON export) + entriesFromPayload
(JSON load), defaulting to null on older payloads — pre-fix
exports degrade to single-query compare (the original broken
behavior, no worse than before the field existed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NotebookView mounted CompareChartWrapper with a hardcoded
sectionRoute: '/notebook' for every pinned chart. buildEffectiveQuery
gates topology-label injection on the sectionRoute prefix:
service-section routes (/service/*) skip node injection because
service KPIs are already scoped by source=... and don't carry a
node label.
Hardcoding /notebook bypassed that gate. The experiment-side fetch
for a pinned bridge KPI (e.g., inference-library TTFT) ended up
running:
sglang_prompt_tokens_total{source="sglang", node="<baseline-host>"}
against the experiment capture. The baseline's hostname doesn't
exist in the experiment recording, so PromQL returned a
"Metric not found" error and only the baseline trace rendered.
Use entry.section (the section the chart was originally pinned
from) to reconstruct the sectionRoute. Bridge KPIs pinned from
/service/inference-library now route as /service/inference-library
in compare mode, skipping node injection symmetrically with the
original section view. Non-bridge pinned charts (CPU, network,
etc.) keep their original sectionRoute too — slightly different
behavior from before, but more consistent: a pinned chart now
renders with the same topology-injection rules it had in its
home section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commits over-explained — long docstrings restating what the code does, history-of-the-bug exposition, and example blocks that duplicated the test cases. Cut to one or two lines per comment site that captures the WHY a future reader can't recover from the code. No behavior change. compare_math tests + viewer smoke still green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pinned histogram charts in compare mode now render the same Full / Tail / Diff (scatter) or diff (heatmap) toggles the section view shows, by reusing chart_controls.compareToggle. The state already round-trips through notebookStore.chartToggles and persistNotebook — only the render was missing. Wraps the chart title in a chart-title-row so the toggle group sits on the right, matching the section-view chartHeader layout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The schema already carries anchors / chartToggles / compare per store, and Save as Report captures all of it in compare mode — but the load-side views ignored it and rendered single-Chart only. Both views now mirror NotebookView's compare-mode contract: - chartBody picks CompareChartWrapper when attrs.compareMode and spec.promql_query are present - chart-header renders compareToggle (Full / Tail / Diff) - per-store chartToggles + anchors threaded into the wrapper - bridge metadata (promql_query_experiment, category_members) on the entry restores the per-side query and the capture-identity exclusion set Toggle writes go to the right store: factored out a generic chartToggleSetter(store, persistFn) and bound three setters — notebook (persists), report (persists), loaded-selection (in-memory only, matches its store lifecycle). app.js mounts now pass compareMode, experimentQueryRange, and the alias pair through to both views. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Was hidden in compare mode under the (now-stale) assumption that selections are single-capture. Previous commit made LoadedSelectionView compare-mode capable — both arms render when an experiment is attached. Drop the !compareMode gate so the button is reachable from the compare-mode topnav.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related bugs along the compare-mode rendering path for category-bridge KPIs (e.g. `inference-library` TTFT/Prompt Token Rate, where the dashboard generates a vllm-side query and a sglang-side query for one chart). Hit at the same time when pinning a bridge chart into the Notebook.
1. Asymmetric label extraction for percentile (scatter) charts
Baseline read `spec.series_names` (string list, lossy on multi-dim metrics — only the first non-`name` label survives) then ran `canonicalQuantileLabel` over the string. Experiment read `item.metric` directly via `canonicalQuantileLabel(item)` which preferred `metric.quantile` and dropped extra dims. The two sides produced disjoint match keys → "compare: no shared labels between captures."
Fix: `composeScatterLabel(metric, options)` in `compare_math.js` — multi-dim aware, with an `excludeValues` set (the category bridge) that drops dims whose values are capture-identity names. `applyResultToPlot` now also stashes `spec.series_metrics` so the baseline path has the raw metric objects available. Both extracts forward `options` symmetrically. Falls back to `canonicalQuantileLabel` for malformed metrics.
2. Pin loses bridge metadata
Pinning a bridge chart kept only `spec.promql_query`. Re-rendering in the Notebook ran the baseline query on the experiment side too, returning empty.
Fix: entries gain two optional fields, `promql_query_experiment` and `category_members`, captured at pin time from the spec + Group's `sectionMetadata`. Round-tripped through localStorage persist/restore and JSON export/import, defaulting to null on older payloads. `chartLoaderMixin` restores `promql_query_experiment` onto the rebuilt spec; NotebookView passes `category_members` to `CompareChartWrapper` as `categoryMembers`.
3. Pinned charts node-inject the wrong host
NotebookView mounted `CompareChartWrapper` with `sectionRoute: '/notebook'`. `buildEffectiveQuery` skips node-label injection for `/service/*` routes (service KPIs are scoped by `source=...`, not node), but `/notebook` doesn't qualify, so the experiment-side fetch ended up with the baseline's hostname injected into the bridge query — zero matches against the experiment recording, server returned "Metric not found". Only the baseline trace rendered.
Fix: use `entry.section` (the chart's home route) instead of `/notebook`. Bridge KPIs now skip node injection symmetrically with their original section view. Non-bridge pinned charts also keep their original section context — slightly different from before but more consistent.
Plumbing
`renderServiceSection` forwards section-level metadata as `sectionMetadata: meta` (a new prop, not `metadata` which is reserved for per-group config). Group reads `attrs.sectionMetadata?.category_members`, threads it through chartBody → CompareChartWrapper → both extract* functions → composeScatterLabel.
Test plan
Known limitations
🤖 Generated with Claude Code