Skip to content

feat(dashboard): histogram rate|mean card pair#938

Merged
thinkingfish merged 19 commits into
iopsystems:mainfrom
thinkingfish:feat/histogram-rate-mean-pair
Jun 1, 2026
Merged

feat(dashboard): histogram rate|mean card pair#938
thinkingfish merged 19 commits into
iopsystems:mainfrom
thinkingfish:feat/histogram-rate-mean-pair

Conversation

@thinkingfish
Copy link
Copy Markdown
Member

@thinkingfish thinkingfish commented May 24, 2026

Summary

For every percentile-subtype histogram plot in the dashboard, emit an adjacent half-width Rate + Mean card pair.

  • Mean is always derived from the histogram column via histogram_mean(<sel>) (metriken-query 0.10.5+). Robust to lock-free snapshot undercount because it's a ratio.
  • Rate prefers an accurate standalone counter when one counts the same events (blockio_operations for blockio latency/size, syscall for syscall latency — these existing rate cards are kept verbatim, the only change is adding the adjacent Mean). Falls back to sum(histogram_irate(<sel>)) (metriken-query 0.10.6+) for histogram-only families — scheduler_runqueue_latency, scheduler_offcpu, scheduler_running, tcp_packet_latency.

Strictly additive: no agent / parquet / exporter / MCP changes. Existing percentile cards untouched.

A small helper SubGroup::histogram_rate_mean(title_stem, id_stem, selector, RateSource, mean_unit) keeps the pair adjacent and declarative at call sites.

Why the fallback uses histogram_irate (not the originally spec'd form)

The original spec called for sum(irate(histogram_count(<sel>)[5m])). That doesn't parse — PromQL grammar disallows range vectors on function-call results. In metriken-query's evaluation model (one value per step from the histogram column) the [5m] window is vestigial anyway, so the cleaner contract is an instant-vector function with no window. metriken-query 0.10.6 ships histogram_irate(m) for exactly this; sum(histogram_irate(<sel>)) composes directly.

Drive-by

Smoke-test selection.js path updated for the lib/ subdir refactor in #932 (separate commit for clean blame).

Test plan

  • cargo test -p dashboard — 39 pass
  • cargo clippy --all-targets -- -D warnings — clean
  • bash tests/viewer_smoke.sh — all assertions pass
  • cargo run -p dashboard -- /tmp/dump/ — verified scheduler.json + network.json carry sum(histogram_irate(...)); blockio/syscall keep their counter-backed rates; every pair card is half-width
  • Manual eyeball on a fresh Linux recording — confirm the new Rate/Mean cards render with sane lines, idle gaps break (not 0) on Mean, units format correctly. Local fixtures in my worktree aren't queryable (pre-existing data-shape mismatch, not a regression) so this needs to happen on real hardware.

selection.js moved to lib/selection/selection.js in iopsystems#932; the smoke
assertion was still curling /lib/selection.js and 404'ing.
Bump metriken-query 0.10.5 → 0.10.6 for histogram_irate(m). Replaces
the spec'd irate(histogram_count(m)[5m]) form, which doesn't parse:
PromQL grammar disallows range vectors on function-call results.
histogram_irate returns an instant vector — composes with sum(...)
directly and sidesteps the grammar problem.
Bumps metriken-query 0.10.7 → 0.10.8 for histogram_sum(m). Dashboard's
histogram_rate_mean helper now renames the Mean card to Mean/Total and
stashes histogram_sum(<selector>) on the spec as promql_query_total.
The line-chart configure path renders an inline checkbox top-right that
fetches and swaps to the total series on click; tooltip series label
substitutes Mean ↔ Total to match the active view. Per-chart state, no
persistence; pattern mirrors scatter.js's spectrum-controls.
initWasmViewer replaces the WASM TSDB but sectionResponseCache,
chartsState, heatmapDataCache, and data.js's metadata cache survived
across loads. The new capture's sections short-circuited to the old
file's cached responses, rendering empty. First load was unaffected
because the caches start empty.
Reloading /viewer/#/overview with no ?demo/?capture param lands on
the FileUpload page but the URL still showed #/overview. Drop the
hash via history.replaceState when there's no capture in the URL,
so the location bar matches what's rendered.
Three things were stale across captures:

- data.js's cachedMetadata held the previous file's {minTime, maxTime},
  so fresh PromQL queries against the new TSDB ran on a window outside
  it and returned empty.
- The active section's cached payload (and therefore the topnav's
  time-range bar) wasn't refetched, because m.route's same-path
  short-circuit suppressed onmatch.
- The cachedView placeholder spins on a splash until loadSection
  resolves, so the eviction needs an explicit refetch kick rather
  than relying on a redraw to do it.

clearMetadataCache() + evict the visible section + loadSection(...) +
m.redraw() covers all three without touching chartsState or the rest
of sectionResponseCache (both of which spun in earlier attempts).
Place it where scatter.js's spectrum-controls sit — under the title,
left edge aligned with the echarts plot grid (past the y-axis gutter).
Previous top-right placement collided with the expand/select-pin icons
and read as right-aligned on half-width Mean cards.
@thinkingfish thinkingfish marked this pull request as ready for review June 1, 2026 05:38
@thinkingfish thinkingfish merged commit 27233f8 into iopsystems:main Jun 1, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant