perf: stream per-file aggregation to eliminate graph recompute RSS sawtooth by aiexkwan · Pull Request #3 · Nanako0129/TokenBar

aiexkwan · 2026-06-12T20:02:01Z

Problem

Follow-up to #1 / #2. With the v1.0.2 caching fixes in place, the remaining resource issue is the graph recompute path: every refresh materializes the full session history as a Vec<UnifiedMessage> (~1M messages on this setup) before aggregating, producing a transient ~1GB allocation. RSS sawtooths between ~1.25GB and ~1.72GB all day (sampling stddev 162.5MB over 5 min).

Change

Replace materialize-then-aggregate with a per-file streaming fold — one scan pass, no full-history Vec:

StreamingAggregator (aggregator.rs): folds &UnifiedMessage into daily buckets one at a time. Dedup semantics preserved exactly: cross-file first-seen-wins seen-sets (claude/opencode/codex/hermes), trae keep-latest-per-session buffer (timestamp, dedup_key tiebreak) flushed after all lanes.
SessionizeAccumulator (sessionize.rs): streaming replacement for sessionize() on the hot path — per-(client, session_id) timestamp vectors + token sums instead of holding every message. Output is parity-tested against sessionize().
scan_messages_streaming (lib.rs): cache-aware driver that iterates each file's cached messages by reference (no cached.messages.clone()), applies pricing on the fly, runs the dedup gate once at the driver level, and feeds both accumulators in a single pass. load_or_parse_source caching, cache writeback, and Gemini invalidate_cache semantics are preserved (verified lane-by-lane against the old path).
All report consumers (graph / model / monthly / hourly / time_metrics) switched off the materializing path. The legacy parse_all_messages_with_pricing_with_env_strategy remains only for the FFI parse_local_unified_messages_resolved (intentional Vec, TODO-tagged).
FFI surface unchanged: git diff main -- crates/tb_core_ffi/src/ has zero extern "C" line changes.

Measurements (16k-file setup, active Claude session writing JSONL throughout)

5s sampling, 5-minute windows:

	v1.0.2 (baseline)	this PR
RSS range	1.25GB ↔ 1.72GB sawtooth	576–639MB steady
RSS sampling stddev	162.5MB	23.2MB
CPU mean / max	17.1% / 181.4%	13.2% / 130.5%

Extended 50-minute monitor (5 rounds, 10-min spacing, final build): RSS envelope 551MB–1213MB, settling at ~645MB by rounds 4–5; no >180% CPU bursts; no sawtooth. The residual floor is the resident message cache (STORE_MEMO), which is prior design and out of scope here.

Also re-ran steady-state on v1.0.2 main as requested in #2: CPU mean 17.1%, 82% of samples ≤22% — the round-1 numbers hold.

Tests

cargo test workspace: 872 passed, 0 failed (24 new: dedup scenario tests with hardcoded expectations, A/B parity vs aggregate_by_date, snapshot-isolation mid-iteration mutation test, SessionizeAccumulator parity suite)
clippy clean across the workspace; make bundle + nm symbol check on the shipped binary

🤖 Generated with Claude Code

…RSS sawtooth Replace full-history Vec<UnifiedMessage> materialization (~1M msgs, ~1GB transient per graph recompute) with per-file streaming fold: - StreamingAggregator: dedup-aware daily fold (cross-file seen-set + trae keep-latest buffer), feed/finalize API - SessionizeAccumulator: streaming session intervals (timestamps + token sums only), replaces full-slice sessionize() on the hot path - scan_messages_streaming: cache-aware per-file reference iteration, driver-level dedup gate, dual-sink single pass; preserves load_or_parse_source caching and Gemini invalidate_cache semantics - All report consumers (graph/model/monthly/hourly/time_metrics) switched off the materializing path; legacy path retained only for FFI parse_local_unified_messages_resolved (TODO-tagged) Measured on the 16k-file setup over a 50-min window: RSS sawtooth 1.25-1.72GB -> steady 0.6-1.2GB envelope (settles ~645MB), sampling stddev 162.5MB -> 23.2MB, no >180% CPU bursts. FFI surface unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Follow-up to PR Nanako0129#3. scan_messages_streaming shared one seen_keys set across copilot/codebuff/kimi/gemini and all simple file lanes, where the old report path did not dedup these lanes at all. Two problems: - Cross-client false collisions: copilot keys are namespaced (trace:span), but codebuff (upstream message id) and kimi (message_id) use raw upstream ids with no client prefix, so an identical id across two clients would silently drop one real message. - Inconsistency: claude/codex/hermes/opencode already use per-client sets; the simple lanes did not follow that design. Each lane now owns its dedup set. This preserves the (correct, and likely intended) intra-client dedup — copilot spans that appear across multiple telemetry sources still collapse — while removing the cross-client coupling. New regression test builds a kimi and a codebuff message sharing dedup_key "COLLIDE" and asserts both survive; it fails against the shared-set version. Also: the new test fixtures (streaming_msg, parity_msg, snapshot_msg) tripped clippy::too_many_arguments, which is deny-level via #![deny(clippy::all)], breaking `cargo clippy --all-targets`. Added the #[allow] the existing fixtures already carry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Nanako0129 · 2026-06-13T07:11:42Z

Thanks @aiexkwan. Validated on a rebase onto main: full cargo test workspace green, clippy --all-targets clean, plus our --selftest/--smoke. The RSS sawtooth is gone here too — nice result.

I found one correctness issue and pushed a fix straight to this branch (0a3cc6a) so it stays part of this PR. scan_messages_streaming shares a single seen_keys set across the copilot/codebuff/kimi/gemini and simple file lanes, but the old report path didn't dedup those lanes at all — so "dedup semantics preserved exactly" isn't quite right for them. It splits into two cases:

copilot: applying its trace:span dedup_key is actually a fix — the old path double-counted spans that show up in more than one telemetry source. Good change.
codebuff / kimi: these use raw upstream message ids as the dedup_key, with no client namespace. With one shared set, an identical id across two clients silently drops a real message.

The fix gives each lane its own dedup set, matching the claude/codex/hermes/opencode lanes that already work this way. That keeps the copilot dedup and removes the cross-client coupling. Added a regression test (a kimi and a codebuff message sharing dedup_key "COLLIDE", both must survive — it fails against the shared set). Also added the #[allow(clippy::too_many_arguments)] the new test fixtures needed; without it clippy --all-targets fails under the crate's #![deny(clippy::all)].

One heads-up worth recording: this changes historical graph/model/hourly counts for anyone who had duplicate copilot spans (now deduped). The agents report still runs the old materialized path, so it won't match for those clients until it migrates too — I'll open a separate issue to track that.

This aggregation path and the message_cache.rs work from #2 are both good candidates to forward upstream to junhoyeo/tokscale. Merging via rebase-and-merge to keep your authorship.

…able PR #3 added a large divergence from upstream tokscale (streaming per-file aggregation across aggregator.rs/lib.rs/sessionize.rs) that the merge missed adding to vendor/README.md. A future re-vendor from junhoyeo/tokscale must re-apply it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aiexkwan mentioned this pull request Jun 12, 2026

perf: stop re-reading and re-aggregating unchanged session history #2

Merged

Nanako0129 merged commit 0752e35 into Nanako0129:main Jun 13, 2026
1 check passed

Nanako0129 mentioned this pull request Jun 13, 2026

Agents report still uses the non-deduped materialized path — inconsistent with graph/model/hourly after #3 #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: stream per-file aggregation to eliminate graph recompute RSS sawtooth#3

perf: stream per-file aggregation to eliminate graph recompute RSS sawtooth#3
Nanako0129 merged 2 commits into
Nanako0129:mainfrom
aiexkwan:perf/streaming-aggregation

aiexkwan commented Jun 12, 2026

Uh oh!

Nanako0129 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aiexkwan commented Jun 12, 2026

Problem

Change

Measurements (16k-file setup, active Claude session writing JSONL throughout)

Tests

Uh oh!

Nanako0129 commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants