fix(checkpoint): recalibrate global counters against ground truth on startup#253
Open
L2ncE wants to merge 1 commit into
Open
fix(checkpoint): recalibrate global counters against ground truth on startup#253L2ncE wants to merge 1 commit into
L2ncE wants to merge 1 commit into
Conversation
…startup Signed-off-by: L2ncE <llance_24@foxmail.com>
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes checkpoint counter drift (issue #157). The four additive-only global counters in
CheckpointManager—total_memories_extracted,l0_conversations_count,total_processed,memories_since_last_persona— only ever increment. Whenmemory-cleanerdeletes expired data or JSONL files are pruned manually, these counters stay permanently inflated above the actual data and never self-correct. The inflatedmemories_since_last_personacan also spuriously trigger persona generation.Adds
CheckpointManager.recalibrate(), invoked once on gateway startup (per the issue's suggested fix), which re-syncs the four counters against authoritative sources:total_memories_extractedrecords/*.jsonltotal line count (append-only, includes dedup stale rows — same semantics as the counter)l0_conversations_countrecordedAtinconversations/*.jsonl(= capture count)total_processedstore.countL0()(falls back to JSONL line count when the store is degraded)memories_since_last_personarecords/*.jsonllines withupdatedAt > last_persona_timeWhy JSONL for counters 1/2/4: they accumulate write events (each store/update/merge appends a JSONL line), so JSONL line count is same-sourced; the store dedups and would under-count. Counter 2 uses distinct
recordedAtbecause the TCVDB adapter has no DISTINCT/group-by capability, so it must read the JSONL files. Counter 3 is a message count, same in store and JSONL, so the existingcountL0()is reused.Trigger timing: recalibrate runs inside the
coreReadypromise chain and is awaited, so the firstagent_end(which doesawait coreReady) cannot overtake it. This keeps recalibration free of concurrency with the L2 repair path inpipeline-factory(which would otherwiseMath.maxthe counters back to stale values).Incremental extraction gates on per-session cursors (
last_l1_cursor/last_extraction_updated_time), not on these global counters, so counter drift cannot cause records to be skipped — issue acceptance #3 is satisfied by the existing design.Related Issue
Fix #157
Change Type
Self-test Checklist
Additional Notes
issue repro: manual JSONL pruning): seed 5 L1 lines → counter = 5 → prune to 2 lines → counter still 5 (drift) → recalibrate → counter = 2. A cleaner-style "delete expired shard file" case is also covered.pipeline_statesentries.pipeline_statesholds per-session incremental cursors (positional semantics), not global counters — deleting them causes re-processing, not counter drift, and is a separate concern outside this PR. Only the JSONL pruning affects the four counters and is fixed here.memory-cleanerrun and the next restart the counters remain inflated; this is acceptable since drift is gradual and the next startup corrects it.npx vitest run), including new unit tests for the count helpers (shard filtering, malformed-line handling, B2 bad-row invariant) and integration tests reproducing the drift and correction.[Unreleased]→🐛 Bug 修复.