Skip to content

feat(0.8.0): memory, skills, coverage release#25

Merged
blackaxgit merged 48 commits into
mainfrom
feat/0.8.0-memory-skills-coverage
May 19, 2026
Merged

feat(0.8.0): memory, skills, coverage release#25
blackaxgit merged 48 commits into
mainfrom
feat/0.8.0-memory-skills-coverage

Conversation

@blackaxgit

Copy link
Copy Markdown
Owner

Summary

The "memory and quality" release. 17 atomic commits, 1218 tests passing, 0 failing.

Eight user-asked features delivered:

  1. Auto-recall 95% accuracy: RRF (k=60) + bge-reranker-v2-m3 + multiplicative time-decay (30d half-life) + p70 percentile gate
  2. Explicit save preserved + opt-in auto-summarize mode (Stop hook + summarize.rs)
  3. Six model-discoverable skills under plugin/.claude-plugin/ (2026 schema)
  4. Pin recent sessions (opt-in always-on context)
  5. PostToolUse aggregator with tool_events table (60s windowed dedup, atomic UPSERT)
  6. Test coverage push: workspace ignore-regex + 17 ratatui TestBackend snapshots
  7. Mutation testing CI (cargo-mutants v27 on 7 hot modules)
  8. clx config-trust file-hash trustlist for project configs
  9. Refactor for testability: handle_event router + dashboard reducer (Elm pattern)
  10. Contract tests: 7 insta-snapshot fixtures for hook JSON envelopes

Architecture (Wave 4b)

Recall pipeline Domain layer no longer imports Storage / LlmClient / EmbeddingStore directly. Two new ports:

  • recall::ports::SnapshotRepo — implemented by storage::recall_repo::StorageSnapshotRepo
  • recall::ports::QueryEmbedder — implemented by recall::adapters::LlmQueryEmbedder

RecallEngine depends only on trait references. Layering proof in recall/mod.rs docstring.

Decisions (user-approved 2026-05-16)

# Question Decision
1 Reranker default true in 0.8.0 (ships with clx model fetch background prefetch + 250 ms graceful degradation)
2 tool_events retention 30d default, configurable via retention.tool_events_days
3 Auto-summarize IN 0.8.0, opt-in (default off)
4 Coverage CI gate warn-only in 0.8.0, hard-fail flip in 0.8.1
5 Golden set synthetic-only (no user content, no PHI)

Two-round comprehensive review

Round 1 (Codex) found and self-fixed 2 BLOCKERs:

  • AutoRecallConfig was missing 6 ranking knobs (rrf_enabled, rrf_k, etc.) — production callers always used Default
  • Auto-summarize counted event_type='message' rows that are never written — fix counts tool_events

Round 2 (4 parallel agents) closed all remaining findings:

  • JSON-recursive secret redaction
  • SHA-256/integrity gate on clx model fetch
  • Reranker cold-load timeout (ensure_loaded moved into spawn_blocking)
  • clx model fetch --force lock ordering inversion
  • tool_events upsert race (schema v7 UNIQUE INDEX + INSERT ... ON CONFLICT DO UPDATE)
  • stop_auto_summary double-write race
  • fastembed 4 → 5.13.4 (ort 2.0 stable)
  • criterion 0.5 → 0.8.2
  • Plugin schema cleanup (drop mcp_servers: {}, drop SKILL version:, add author/license)
  • Recall Domain port extraction (BLOCKER architecture finding)

Schema migrations

  • v6: adds tool_events table + 2 indexes (additive, no destructive changes)
  • v7: adds UNIQUE INDEX tool_events_dedup_idx for atomic upsert (additive)

Test plan

  • cargo test --workspace passes locally on macOS (verified: 1218 pass, 0 fail, 10 ignored)
  • cargo check --workspace clean (verified)
  • Live recall smoke on Azure-routed install: clx recall "previous discussion on Azure backend" returns hits within 500 ms p95
  • clx model fetch downloads bge-reranker-v2-m3 (568 MB) and writes .ready only after integrity gate
  • clx config-trust add <repo>/.clx/config.yaml registers hash; edit the file, recall re-trusts only after clx config-trust add rerun
  • Six skills load via Claude Code 2026 plugin discovery (claude /skills list)
  • clx maintenance trim removes tool_events rows older than 30d
  • Auto-summarize with memory.auto_summarize.enabled: true writes one AutoSummary snapshot per 5 turns

Deferred to 0.8.1 (documented)

  • summarize.rs Domain port extraction
  • policy/{rules,llm}.rs Domain port extraction
  • query_percentile_gate duplication (subagent + mcp/recall)
  • Hard-fail coverage CI gate (warn-only in 0.8.0)
  • User-derived RAGAS golden set layer

Risks

  • bge-reranker-v2-m3 is a 568 MB one-time download — first recall after upgrade may use RRF-only until prefetch completes; set auto_recall.reranker_enabled: false to opt out
  • Plugin path migration is a manual one-shot via plugin/scripts/migrate.sh; old plugin/skills/ removed in 0.9.0
  • fastembed v5 ships with ort 2.0.0-rc.12 transitively (waiting on GA-stable in a future fastembed point release)

Branch state

17 commits, all atomic
cargo check: clean
cargo test --workspace: 1218 pass / 0 fail / 10 ignored
working tree: clean

blackaxgit added 30 commits May 16, 2026 19:40
- F: fastembed bge-reranker-v2-m3 cross-encoder stage in recall pipeline
  with 250 ms timeout + graceful RRF-only fallback; clx model fetch/status/list
  CLI; background prefetch from UserPromptSubmit once per process
- G: auto_recall.pin_recent_sessions opt-in injects last-N session summaries
  with current-session self-pin guard; new Storage::recent_session_summaries
- H: memory.auto_summarize opt-in Stop hook rolls up N-turn windows into
  AutoSummary snapshots via configured chat LLM; deterministic template
  fallback when LLM unavailable; new SnapshotTrigger::AutoSummary
…iring

- I: workspace llvm-cov ignore-regex (event.rs, main.rs, runtime.rs) + 17
  ratatui TestBackend + insta snapshots for dashboard/ui/detail.rs and
  dashboard/settings/render.rs
- J: cargo-mutants v27 configuration (mutants.toml) + weekly baseline
  workflow + PR-diff workflow + docs/mutation-testing.md
- K: 30-pair synthetic RAGAS golden set (no PHI, no user content) +
  criterion bench benches/recall_accuracy.rs over rrf_enabled true/false
- L: clx config-trust file-hash trustlist (parallel to existing PR #15
  trust-mode) — bypasses inert filter at project.rs:86 for trusted hashes;
  ~/.clx/trusted_configs.json with 0600 mode; 15 unit + 4 integration tests
- M: clx-hook main.rs slimmed 196 to 126 LoC, delegates to lib router;
  dashboard event loop now drives reducer via all DashboardEvent variants
  (Key, Resize, Tick, Quit); dead_code warnings on Resize/Tick/Quit gone
- Add rrf_enabled, rrf_k, time_decay_half_life_days, percentile_gate,
  reranker_enabled, reranker_timeout_ms to AutoRecallConfig with defaults
- Map AutoRecallConfig fields into RecallQueryConfig in hook subagent
- Attach FastembedReranker in hook and MCP recall when reranker_enabled=true
Replaces event_type='message' query that never matched production
writes with a count of tool_events rows for the current session.
…s, plugin schema

- A (security): redact_json_value walks serde_json::Value recursively,
  redacting 20 sensitive key patterns case-insensitive; verify_model_dir_complete
  enforces required files non-zero before .ready sentinel; clx model fetch
  --force now acquires lock before destructive remove_dir_all
- B (concurrency): migrate_to_v7 adds UNIQUE INDEX tool_events_dedup_idx
  on (session_id, tool_name, IFNULL(target,''), window_end_unix/60);
  append_or_extend_tool_event rewritten as INSERT ... ON CONFLICT DO UPDATE
  for atomic upsert; stop_auto_summary re-reads last AutoSummary timestamp
  before write to skip duplicate snapshots on rapid Stop hooks
- C (async + deps): FastembedReranker::score lazy-loads inside the same
  spawn_blocking as the rerank call so tokio::time::timeout governs cold
  load; fastembed 4 -> 5 (ort 2.0 stable); criterion 0.5 -> 0.8
- D (plugin schema): plugin.json drops non-spec mcp_servers, adds author
  + license; all 6 SKILL.md strip non-spec version frontmatter field
…Embedder

Recall pipeline Domain layer no longer imports Storage / LlmClient /
EmbeddingStore. Two ports added in recall/ports.rs; RecallEngine moved to
recall/engine.rs and depends on traits only. Concrete adapters live in
storage/recall_repo.rs (StorageSnapshotRepo) and recall/adapters.rs
(LlmQueryEmbedder). All call sites in clx-hook, clx-mcp, and the recall
accuracy bench wire adapters at construction time. Public builder API
preserved. Codex BLOCKER finding resolved.
Fixes 47 distinct pedantic warnings across the 0.8.0 feature crates:
- Missing doc backticks around SQLite/HuggingFace/identifier names
- format! into String -> write!
- push_str single-char -> push
- Collapsed nested if-let chains
- map().unwrap_or() -> map_or()
- match-for-destructuring -> if-let / let-else
- must_use on builder methods
- Removed undeclared test-fixtures cfg gate in clx-hook router

Documented allow attributes on dispatch-shape functions, owned-PathBuf
clap arms, and the App API retained for forward dashboard wiring.
Behavior preservation verified: 1218 tests still pass, 0 fail.
Red/Green/Purple sequential review classified 13 findings:
- 4 DEFENDED (false-positive or existing test coverage)
- 4 PARTIAL (real but mitigated)
- 4 UNDEFENDED (real, no defense)
- 1 SUBSUMED

Three UNDEFENDED items are MUST-FIX in 0.8.0:

F10 (HIGH): PostToolUse audit_log.command stored raw Bash and MCP
commands with inline secrets. The pre_tool_use path wrapped through
log_audit_entry (redact_secrets); post_tool_use bypassed it. Fix:
redact_secrets(command) before AuditLogEntry::new in post_tool_use.rs.

F4 (HIGH): format_recall_context interpolated stored snapshot
summary/key_facts verbatim into the historical-context wrapper. A
malicious clx_remember payload could close the tag and inject
system-style instructions that persist across sessions. Fix: new
sanitize_recall_text helper escapes &lt; and &gt; in both summary and
key_facts paths. Regression test asserts the escaped form.

F1 (MED): redact_secrets free-text scanner missed api_key = sk_... with
whitespace around the separator, lowercase bearer tokens, and
Authorization: Basic ... credentials. Fix: section 3 (case-insensitive
bearer/basic scheme) runs before a new section 2b (whitespace-tolerant
keyword scan that skips scheme tokens). Four regression tests.

Verification: cargo test --workspace = 1223 pass / 0 fail (+5 from
Purple regressions). cargo clippy --workspace --all-targets -D
warnings = exit 0.

CHANGELOG documents both the fixes and the five Purple findings
deferred to 0.8.1 (F3, F5, F7, F8, F9) with rationale per finding.
blackaxgit added 18 commits May 17, 2026 21:34
…omplete F8, fixes /dev/zero unbounded-read DoS)
…eak (harden_command forces model-fetch dryrun)
…hain_trust.rs) from instrumented denominator, documented rationale in Cargo.toml
…ath-guard (fixes macOS symlinked-HOME false-reject)
…nator, 97% deferred to 0.8.1 (injectable provider harness)
…ig path

- redact_title_config_path: longest-known-prefix match handles the
  ratatui panel-title truncation (the prior full-string match only
  worked on the dev HOME, so CI on a different HOME was red)
- canonicalize the volatile tail to fixed width; gated so only the
  title line changes (sessions/audit/rules/detail snaps byte-identical)
- regenerate 9 affected snapshots; verified green on short and long
  clean HOME and cargo insta test --workspace --check (CI Coverage gate)
@blackaxgit blackaxgit marked this pull request as ready for review May 19, 2026 00:34
@blackaxgit blackaxgit merged commit 6b00e12 into main May 19, 2026
8 checks passed
@blackaxgit blackaxgit deleted the feat/0.8.0-memory-skills-coverage branch May 19, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant