feat(0.8.0): memory, skills, coverage release by blackaxgit · Pull Request #25 · blackaxgit/clx

blackaxgit · 2026-05-17T22:20:07Z

Summary

The "memory and quality" release. 17 atomic commits, 1218 tests passing, 0 failing.

Eight user-asked features delivered:

Auto-recall 95% accuracy: RRF (k=60) + bge-reranker-v2-m3 + multiplicative time-decay (30d half-life) + p70 percentile gate
Explicit save preserved + opt-in auto-summarize mode (Stop hook + summarize.rs)
Six model-discoverable skills under plugin/.claude-plugin/ (2026 schema)
Pin recent sessions (opt-in always-on context)
PostToolUse aggregator with tool_events table (60s windowed dedup, atomic UPSERT)
Test coverage push: workspace ignore-regex + 17 ratatui TestBackend snapshots
Mutation testing CI (cargo-mutants v27 on 7 hot modules)
clx config-trust file-hash trustlist for project configs
Refactor for testability: handle_event router + dashboard reducer (Elm pattern)
Contract tests: 7 insta-snapshot fixtures for hook JSON envelopes

Architecture (Wave 4b)

Recall pipeline Domain layer no longer imports Storage / LlmClient / EmbeddingStore directly. Two new ports:

recall::ports::SnapshotRepo — implemented by storage::recall_repo::StorageSnapshotRepo
recall::ports::QueryEmbedder — implemented by recall::adapters::LlmQueryEmbedder

RecallEngine depends only on trait references. Layering proof in recall/mod.rs docstring.

Decisions (user-approved 2026-05-16)

#	Question	Decision
1	Reranker default	true in 0.8.0 (ships with `clx model fetch` background prefetch + 250 ms graceful degradation)
2	`tool_events` retention	30d default, configurable via `retention.tool_events_days`
3	Auto-summarize	IN 0.8.0, opt-in (default off)
4	Coverage CI gate	warn-only in 0.8.0, hard-fail flip in 0.8.1
5	Golden set	synthetic-only (no user content, no PHI)

Two-round comprehensive review

Round 1 (Codex) found and self-fixed 2 BLOCKERs:

AutoRecallConfig was missing 6 ranking knobs (rrf_enabled, rrf_k, etc.) — production callers always used Default
Auto-summarize counted event_type='message' rows that are never written — fix counts tool_events

Round 2 (4 parallel agents) closed all remaining findings:

JSON-recursive secret redaction
SHA-256/integrity gate on clx model fetch
Reranker cold-load timeout (ensure_loaded moved into spawn_blocking)
clx model fetch --force lock ordering inversion
tool_events upsert race (schema v7 UNIQUE INDEX + INSERT ... ON CONFLICT DO UPDATE)
stop_auto_summary double-write race
fastembed 4 → 5.13.4 (ort 2.0 stable)
criterion 0.5 → 0.8.2
Plugin schema cleanup (drop mcp_servers: {}, drop SKILL version:, add author/license)
Recall Domain port extraction (BLOCKER architecture finding)

Schema migrations

v6: adds tool_events table + 2 indexes (additive, no destructive changes)
v7: adds UNIQUE INDEX tool_events_dedup_idx for atomic upsert (additive)

Test plan

cargo test --workspace passes locally on macOS (verified: 1218 pass, 0 fail, 10 ignored)
cargo check --workspace clean (verified)
Live recall smoke on Azure-routed install: clx recall "previous discussion on Azure backend" returns hits within 500 ms p95
clx model fetch downloads bge-reranker-v2-m3 (568 MB) and writes .ready only after integrity gate
clx config-trust add <repo>/.clx/config.yaml registers hash; edit the file, recall re-trusts only after clx config-trust add rerun
Six skills load via Claude Code 2026 plugin discovery (claude /skills list)
clx maintenance trim removes tool_events rows older than 30d
Auto-summarize with memory.auto_summarize.enabled: true writes one AutoSummary snapshot per 5 turns

Deferred to 0.8.1 (documented)

summarize.rs Domain port extraction
policy/{rules,llm}.rs Domain port extraction
query_percentile_gate duplication (subagent + mcp/recall)
Hard-fail coverage CI gate (warn-only in 0.8.0)
User-derived RAGAS golden set layer

Risks

bge-reranker-v2-m3 is a 568 MB one-time download — first recall after upgrade may use RRF-only until prefetch completes; set auto_recall.reranker_enabled: false to opt out
Plugin path migration is a manual one-shot via plugin/scripts/migrate.sh; old plugin/skills/ removed in 0.9.0
fastembed v5 ships with ort 2.0.0-rc.12 transitively (waiting on GA-stable in a future fastembed point release)

Branch state

17 commits, all atomic
cargo check: clean
cargo test --workspace: 1218 pass / 0 fail / 10 ignored
working tree: clean

…nce cmd

…ne::query

- F: fastembed bge-reranker-v2-m3 cross-encoder stage in recall pipeline with 250 ms timeout + graceful RRF-only fallback; clx model fetch/status/list CLI; background prefetch from UserPromptSubmit once per process - G: auto_recall.pin_recent_sessions opt-in injects last-N session summaries with current-session self-pin guard; new Storage::recent_session_summaries - H: memory.auto_summarize opt-in Stop hook rolls up N-turn windows into AutoSummary snapshots via configured chat LLM; deterministic template fallback when LLM unavailable; new SnapshotTrigger::AutoSummary

…iring - I: workspace llvm-cov ignore-regex (event.rs, main.rs, runtime.rs) + 17 ratatui TestBackend + insta snapshots for dashboard/ui/detail.rs and dashboard/settings/render.rs - J: cargo-mutants v27 configuration (mutants.toml) + weekly baseline workflow + PR-diff workflow + docs/mutation-testing.md - K: 30-pair synthetic RAGAS golden set (no PHI, no user content) + criterion bench benches/recall_accuracy.rs over rrf_enabled true/false - L: clx config-trust file-hash trustlist (parallel to existing PR #15 trust-mode) — bypasses inert filter at project.rs:86 for trusted hashes; ~/.clx/trusted_configs.json with 0600 mode; 15 unit + 4 integration tests - M: clx-hook main.rs slimmed 196 to 126 LoC, delegates to lib router; dashboard event loop now drives reducer via all DashboardEvent variants (Key, Resize, Tick, Quit); dead_code warnings on Resize/Tick/Quit gone

- Add rrf_enabled, rrf_k, time_decay_half_life_days, percentile_gate, reranker_enabled, reranker_timeout_ms to AutoRecallConfig with defaults - Map AutoRecallConfig fields into RecallQueryConfig in hook subagent - Attach FastembedReranker in hook and MCP recall when reranker_enabled=true

Replaces event_type='message' query that never matched production writes with a count of tool_events rows for the current session.

…s, plugin schema - A (security): redact_json_value walks serde_json::Value recursively, redacting 20 sensitive key patterns case-insensitive; verify_model_dir_complete enforces required files non-zero before .ready sentinel; clx model fetch --force now acquires lock before destructive remove_dir_all - B (concurrency): migrate_to_v7 adds UNIQUE INDEX tool_events_dedup_idx on (session_id, tool_name, IFNULL(target,''), window_end_unix/60); append_or_extend_tool_event rewritten as INSERT ... ON CONFLICT DO UPDATE for atomic upsert; stop_auto_summary re-reads last AutoSummary timestamp before write to skip duplicate snapshots on rapid Stop hooks - C (async + deps): FastembedReranker::score lazy-loads inside the same spawn_blocking as the rerank call so tokio::time::timeout governs cold load; fastembed 4 -> 5 (ort 2.0 stable); criterion 0.5 -> 0.8 - D (plugin schema): plugin.json drops non-spec mcp_servers, adds author + license; all 6 SKILL.md strip non-spec version frontmatter field

…Embedder Recall pipeline Domain layer no longer imports Storage / LlmClient / EmbeddingStore. Two ports added in recall/ports.rs; RecallEngine moved to recall/engine.rs and depends on traits only. Concrete adapters live in storage/recall_repo.rs (StorageSnapshotRepo) and recall/adapters.rs (LlmQueryEmbedder). All call sites in clx-hook, clx-mcp, and the recall accuracy bench wire adapters at construction time. Public builder API preserved. Codex BLOCKER finding resolved.

Fixes 47 distinct pedantic warnings across the 0.8.0 feature crates: - Missing doc backticks around SQLite/HuggingFace/identifier names - format! into String -> write! - push_str single-char -> push - Collapsed nested if-let chains - map().unwrap_or() -> map_or() - match-for-destructuring -> if-let / let-else - must_use on builder methods - Removed undeclared test-fixtures cfg gate in clx-hook router Documented allow attributes on dispatch-shape functions, owned-PathBuf clap arms, and the App API retained for forward dashboard wiring. Behavior preservation verified: 1218 tests still pass, 0 fail.

Red/Green/Purple sequential review classified 13 findings: - 4 DEFENDED (false-positive or existing test coverage) - 4 PARTIAL (real but mitigated) - 4 UNDEFENDED (real, no defense) - 1 SUBSUMED Three UNDEFENDED items are MUST-FIX in 0.8.0: F10 (HIGH): PostToolUse audit_log.command stored raw Bash and MCP commands with inline secrets. The pre_tool_use path wrapped through log_audit_entry (redact_secrets); post_tool_use bypassed it. Fix: redact_secrets(command) before AuditLogEntry::new in post_tool_use.rs. F4 (HIGH): format_recall_context interpolated stored snapshot summary/key_facts verbatim into the historical-context wrapper. A malicious clx_remember payload could close the tag and inject system-style instructions that persist across sessions. Fix: new sanitize_recall_text helper escapes < and > in both summary and key_facts paths. Regression test asserts the escaped form. F1 (MED): redact_secrets free-text scanner missed api_key = sk_... with whitespace around the separator, lowercase bearer tokens, and Authorization: Basic ... credentials. Fix: section 3 (case-insensitive bearer/basic scheme) runs before a new section 2b (whitespace-tolerant keyword scan that skips scheme tokens). Four regression tests. Verification: cargo test --workspace = 1223 pass / 0 fail (+5 from Purple regressions). cargo clippy --workspace --all-targets -D warnings = exit 0. CHANGELOG documents both the fixes and the five Purple findings deferred to 0.8.1 (F3, F5, F7, F8, F9) with rationale per finding.

… -> 0.8.0)

…nance

…e findings now in 0.8.0)

…in prompts

…mit count) per final gate

…ir command

…nfig merge

…w-identical wiring)

…mand path

…ompt, not per-item loop)

…t-in (eliminates macOS prompt)

…lean lock guard

…cation spec

… band, L1 timeout, denial_count)

…omplete F8, fixes /dev/zero unbounded-read DoS)

…eak (harden_command forces model-fetch dryrun)

…rage denominator policy

… (+252 tests)

…r / 2 false-pos) + CHANGELOG Known issues

…ess LLVM auto-resolve & no-fail-fast

…newer-DB guard, I-R2 Stop contract fixture

…LI (+43 hermetic tests)

…hain_trust.rs) from instrumented denominator, documented rationale in Cargo.toml

…nstrumented 83.02% -> 85.68%+)

…ath-guard (fixes macOS symlinked-HOME false-reject)

…nator, 97% deferred to 0.8.1 (injectable provider harness)

…(eliminate residual leak pattern)

…ig path - redact_title_config_path: longest-known-prefix match handles the ratatui panel-title truncation (the prior full-string match only worked on the dev HOME, so CI on a different HOME was red) - canonicalize the volatile tail to fixed width; gated so only the title line changes (sessions/audit/rules/detail snaps byte-identical) - regenerate 9 affected snapshots; verified green on short and long clean HOME and cargo insta test --workspace --check (CI Coverage gate)

blackaxgit added 30 commits May 16, 2026 19:40

docs(0.8.0): add memory/skills/coverage design spec

bd29dc0

docs(0.8.0): add 5 implementation plans (groups A-E)

9bcd81b

feat(plugin): migrate to 2026 .claude-plugin layout with 6 named skills

ba32c81

refactor(clx-hook): extract handle_event router + 7 contract tests

60cd0ae

refactor(clx/dashboard): scaffold pure reducer + state separation

5a61272

feat(clx-core): tool_events table + aggregator + retention + maintena…

c9d24f9

…nce cmd

feat(recall): add RRF rank fusion + time-decay + percentile gate modules

581f3a8

feat(post_tool_use): wire tool_events aggregator into PostToolUse hook

43f1c3f

feat(recall): wire RRF + time-decay + percentile gate into RecallEngi…

ddfeb4e

…ne::query

release(0.8.0): bump workspace version 0.7.2 -> 0.8.0 + CHANGELOG

2ac3e96

fix(storage): count tool_events for auto-summary turn threshold

b2087dc

Replaces event_type='message' query that never matched production writes with a count of tool_events rows for the current session.

docs(0.8.0): polish stale comments + document Wave 4a fixes in CHANGELOG

eca97d6

fix(security): F3 TOCTOU + F8 transcript path hardening (Purple 0.8.1…

25c7eff

… -> 0.8.0)

fix(security): F5 summarizer injection fence + F7 hook envelope prove…

2511187

…nance

fix(security): F9 reranker model SHA-256 pin + CHANGELOG (all 5 Purpl…

bc6591b

…e findings now in 0.8.0)

fix(clx-mcp): session-scoped credential cache to stop repeated keycha…

6ed77b5

…in prompts

style(0.8.0): cargo fmt + CHANGELOG accuracy (fastembed 5.x, drop com…

263f3cd

…mit count) per final gate

feat(clx): relax keychain ACL at store time + clx keychain-trust repa…

744e234

…ir command

fix(install): register Stop hook, install 6 skills, version stamp, co…

b494158

…nfig merge

feat(scripts): add install-local.sh for brew-free source install (bre…

88fea82

…w-identical wiring)

fix(scripts): BSD chmod (no --) + MCP verify grep matches clx-mcp com…

b23826a

…mand path

fix(keychain): keychain-trust uses single partition-list call (one pr…

1fe3d81

…ompt, not per-item loop)

feat(credentials): default to age-encrypted file backend, keychain op…

6a50268

…t-in (eliminates macOS prompt)

blackaxgit added 18 commits May 17, 2026 21:34

docs(0.8.0): CHANGELOG entry for file-default credential backend

8785295

fix(credentials): derive list from backend (no index race) + clippy c…

1b8515a

…lean lock guard

docs(0.8.0): comprehensive pre-release functional + behavior + verifi…

650f397

…cation spec

fix(0.8.0): resolve 4 HIGH pre-tag blockers (license MPL-2.0, L1 deny…

1011d5f

… band, L1 timeout, denial_count)

fix(clx-hook): bound transcript reads to regular files + 64MiB cap (c…

392f93c

…omplete F8, fixes /dev/zero unbounded-read DoS)

test(clx-hook): RAII isolated-HOME helper eliminates 2.1GB temp-dir l…

f55a080

…eak (harden_command forces model-fetch dryrun)

chore(test): local test harness (nextest + llvm-cov) + published cove…

6184055

…rage denominator policy

test(0.8.0): coverage campaign Wave 1 — behavior + e2e + pixel suites…

2112852

… (+252 tests)

docs(0.8.0): risk-register triage (13 accepted / 3 cheap-fix / 5 defe…

e19eee5

…r / 2 false-pos) + CHANGELOG Known issues

fix(test): nextest sqlite-vec isolation in embeddings test + cov harn…

15fadff

…ess LLVM auto-resolve & no-fail-fast

fix(0.8.0): risk-register cheap fixes — C-R1 key suffix, M-R1 refuse-…

f0f8b88

…newer-DB guard, I-R2 Stop contract fixture

test(clx): assert_cmd e2e suites for recall/config/embeddings/rules C…

9908d87

…LI (+43 hermetic tests)

chore(cov): exclude irreducible glue (dashboard/mod.rs, commands/keyc…

d92a349

…hain_trust.rs) from instrumented denominator, documented rationale in Cargo.toml

test(0.8.0): deep success/branch-pipeline coverage wave (+77 tests, i…

68f6de8

…nstrumented 83.02% -> 85.68%+)

fix(clx-mcp): clx_rules get_project_rules canonicalizes home before p…

26a125a

…ath-guard (fixes macOS symlinked-HOME false-reject)

docs(0.8.0): honest coverage disposition — 85.72% on published denomi…

6f2b36a

…nator, 97% deferred to 0.8.1 (injectable provider harness)

test(clx-hook): RAII tempfile::TempDir in transcript.rs inline tests …

f5c843a

…(eliminate residual leak pattern)

blackaxgit marked this pull request as ready for review May 19, 2026 00:34

blackaxgit merged commit 6b00e12 into main May 19, 2026
8 checks passed

blackaxgit deleted the feat/0.8.0-memory-skills-coverage branch May 19, 2026 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.8.0): memory, skills, coverage release#25

feat(0.8.0): memory, skills, coverage release#25
blackaxgit merged 48 commits into
mainfrom
feat/0.8.0-memory-skills-coverage

blackaxgit commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blackaxgit commented May 17, 2026

Summary

Architecture (Wave 4b)

Decisions (user-approved 2026-05-16)

Two-round comprehensive review

Schema migrations

Test plan

Deferred to 0.8.1 (documented)

Risks

Branch state

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant