fix(checkpoint): recalibrate global counters against ground truth on startup by L2ncE · Pull Request #253 · TencentCloud/TencentDB-Agent-Memory

L2ncE · 2026-06-25T05:16:16Z

Description

Fixes checkpoint counter drift (issue #157). The four additive-only global counters in CheckpointManager — total_memories_extracted, l0_conversations_count, total_processed, memories_since_last_persona — only ever increment. When memory-cleaner deletes expired data or JSONL files are pruned manually, these counters stay permanently inflated above the actual data and never self-correct. The inflated memories_since_last_persona can also spuriously trigger persona generation.

Adds CheckpointManager.recalibrate(), invoked once on gateway startup (per the issue's suggested fix), which re-syncs the four counters against authoritative sources:

Counter	Source of truth
`total_memories_extracted`	`records/*.jsonl` total line count (append-only, includes dedup stale rows — same semantics as the counter)
`l0_conversations_count`	distinct `recordedAt` in `conversations/*.jsonl` (= capture count)
`total_processed`	`store.countL0()` (falls back to JSONL line count when the store is degraded)
`memories_since_last_persona`	`records/*.jsonl` lines with `updatedAt > last_persona_time`

Why JSONL for counters 1/2/4: they accumulate write events (each store/update/merge appends a JSONL line), so JSONL line count is same-sourced; the store dedups and would under-count. Counter 2 uses distinct recordedAt because the TCVDB adapter has no DISTINCT/group-by capability, so it must read the JSONL files. Counter 3 is a message count, same in store and JSONL, so the existing countL0() is reused.

Trigger timing: recalibrate runs inside the coreReady promise chain and is awaited, so the first agent_end (which does await coreReady) cannot overtake it. This keeps recalibration free of concurrency with the L2 repair path in pipeline-factory (which would otherwise Math.max the counters back to stale values).

Incremental extraction gates on per-session cursors (last_l1_cursor / last_extraction_updated_time), not on these global counters, so counter drift cannot cause records to be skipped — issue acceptance #3 is satisfied by the existing design.

Related Issue

Fix #157

Change Type

Bug fix | Bug 修复
New feature | 新功能
Documentation update | 文档更新
Code optimization | 代码优化

Self-test Checklist

Verified locally | 本地验证通过
No existing features affected | 无影响现有功能

Additional Notes

Issue repro coverage: the manual-JSONL-pruning repro from the issue is covered by a unit test (issue repro: manual JSONL pruning): seed 5 L1 lines → counter = 5 → prune to 2 lines → counter still 5 (drift) → recalibrate → counter = 2. A cleaner-style "delete expired shard file" case is also covered.
Scope note: issue [good first issue]🎯 fix(data): checkpoint counters never decrease — drift from actual data after cleanup #157's repro also mentions deleting pipeline_states entries. pipeline_states holds per-session incremental cursors (positional semantics), not global counters — deleting them causes re-processing, not counter drift, and is a separate concern outside this PR. Only the JSONL pruning affects the four counters and is fixed here.
Known trade-off: recalibrate runs only at startup (as the issue requests). Between a memory-cleaner run and the next restart the counters remain inflated; this is acceptable since drift is gradual and the next startup corrects it.
Test suite: 84 tests pass (npx vitest run), including new unit tests for the count helpers (shard filtering, malformed-line handling, B2 bad-row invariant) and integration tests reproducing the drift and correction.
CHANGELOG: updated under [Unreleased] → 🐛 Bug 修复.

…startup Signed-off-by: L2ncE <llance_24@foxmail.com>

Maxwell-Code07 · 2026-06-25T15:17:27Z

@L2ncE Welcome as a first-time contributor! Checkpoint counter drift (#157) is a long-standing issue — the recalibrate-on-startup approach is clean and well-implemented. Thanks for the contribution! 👍

fix(checkpoint): recalibrate global counters against ground truth on …

7e5974e

…startup Signed-off-by: L2ncE <llance_24@foxmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(checkpoint): recalibrate global counters against ground truth on startup#253

fix(checkpoint): recalibrate global counters against ground truth on startup#253
L2ncE wants to merge 1 commit into
TencentCloud:mainfrom
L2ncE:fix/checkpoint-recalibrate

L2ncE commented Jun 25, 2026

Uh oh!

Maxwell-Code07 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

L2ncE commented Jun 25, 2026

Description

Related Issue

Change Type

Self-test Checklist

Additional Notes

Uh oh!

Maxwell-Code07 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants