Skip to content

Add aim-sot Source-of-Truth subsystem#187

Merged
Hidden-History merged 37 commits into
mainfrom
feat/aim-sot-wave1
Jun 16, 2026
Merged

Add aim-sot Source-of-Truth subsystem#187
Hidden-History merged 37 commits into
mainfrom
feat/aim-sot-wave1

Conversation

@Hidden-History

@Hidden-History Hidden-History commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Ships aim-sot — a Source-of-Truth subsystem that tracks where the truth lives for each part of the end-user's own project, and whether it's still accurate. It is a generic, multi-CLI feature: the registry lives in the user's repo, consultable by the user's agents, kept current by a propose-only trigger behind a mandatory in-skill verify gate. Behavior model throughout: detect-and-propose + verify-gate, never silent auto-rewrite.

The feature ships on all four CLIs — Claude Code, Codex, Cursor, and Gemini. The registry, consult, detect-propose, and verify modes are shared; each CLI gets a propose-only end-of-turn trigger.

Design spec: oversight/specs/SOT-SUBSYSTEM-DESIGN-2026-06-07b.md (in the oversight workspace).

What's in this PR (5 parts)

  1. Registry of record.sot/registry.yaml lives in the user's repo, committed, human-authored. Shipped as a JSON-Schema (draft-07) + .sot/registry.yaml / .sot/README.md templates. No project data is baked into the skill; additionalProperties:false enforces the no-copy / no-machine-field invariant. (8-kind taxonomy, path|component|concern boundary types, 6 core fields + status.)

  2. aim-sot skill — consult mode — read-only "where does X live / who owns it / how is it drift-checked," served from the derived memory cache with a registry fallback.

  3. aim-sot skill — detect-propose engine — hybrid auto-discovery → proposes a minimal patch on drift or new candidates. Never writes the registry. Maintains a per-install drift cache (~/.ai-memory/drift-state/, atomic + lock + 7-day TTL throttle) and a derived conventions memory cache (memory_type="sot_entry", rebuildable).

  4. aim-sot skill — verify mode — the 16-check gate (Schema / Referential / Completeness / Content) → PASS / CONDITIONAL / FAIL. Schema checks are driven from registry.schema.json at runtime (stdlib only — no new dependency). Mandatory in-skill; CI / pre-commit is opt-in only, never auto-installed into the user's repo.

  5. Per-CLI trigger hooks — a propose-only end-of-turn trigger per CLI that invokes the detect-propose engine and surfaces drift / new-candidate counts to stderr; fail-open on every path (never blocks the CLI):

    • Claude.claude/hooks/scripts/sot_drift_stop.py (Stop), wired via merge_settings.py. Loop-guarded (stop_hook_active), worktree-cwd-normalized.
    • Codexsrc/memory/adapters/codex/sot_drift.py (Stop)
    • Cursorsrc/memory/adapters/cursor/sot_drift.py (PreCompact)
    • Geminisrc/memory/adapters/gemini/sot_drift.py (PreCompress → canonical PreCompact)

    The three non-Claude adapters normalize their native stdin via the existing per-CLI schema, take cwd from the normalized event, gate on the presence of .sot/registry.yaml, and reuse the shared engine-invocation core. All git-agnostic (TTL re-check, no git dependency).

Notes for review

  • Trigger placement: Claude is the native platform — its hook lives in Claude's native hooks tree (.claude/hooks/scripts/, wired via merge_settings.py), not src/memory/adapters/claude/. Only Codex/Cursor/Gemini use the src/memory/adapters/{cli}/ layer. This matches the convention in Ship an AI-Memory agent-guidance file to projects via the installer (TD-600) #185 (Claude → .claude/…, the other three → adapters/templates/{cli}/).
  • Per-CLI event model: the shipped adapters hook each CLI's existing end-of-session event (Codex Stop, Cursor preCompact, Gemini PreCompress); the canonical event vocabulary (schema.py VALID_HOOK_EVENTS) is untouched.
  • Opt-in OFF by default (portability rule): every trigger ships unregistered — none is added to a user's hook config on install. The tool never writes into a user's VCS/hook config without explicit opt-in. SKILL.md documents the manual-enablement snippet for each CLI (.claude/settings.json, .codex/hooks.json, .cursor/hooks.json, .gemini/settings.json). The engine also runs standalone (manual/cron) as the no-hook default.
  • SKILL.md invocation paths: the consult / detect-propose / verify invocation blocks pass the full script path to run-with-env.sh (${AI_MEMORY_INSTALL_DIR:-$HOME/.ai-memory}/_ai-memory/skills/aim-sot/scripts/…); the bare-name form resolved to scripts/memory/ and failed "Script not found" from a normal CWD.
  • Behavior-preserving / additive: no existing workflow, skill, adapter, or hook behavior changes. A new SOT_ENTRY memory type was added to the enum but is intentionally system-internal (not wired into search / intent / classifier).

Testing

Each mode ships a companion test deliverable (contract tests for schema/templates; hermetic unit tests for the engine modes and every trigger). All tests live under top-level tests/. The Claude hook has 14 hermetic tests; each of the three adapters has 10 (30 total) — covering the registry opt-in gate, invocation correctness (engine argv + mutation guards), non-git fallback, the propose-only / no-write guarantee (AST-based source scan with a mutation guard), and fail-open paths (empty / malformed stdin, engine non-zero exit, subprocess timeout, SIGALRM). The non-Claude adapter tests exercise the real per-CLI schema normalizers.

Deferred (tracked as code TODOs, not in this PR)

  • Directory content-hash drift (dir-tree digest) for component/path boundaries whose sot_location is a directory.
  • Wiring sot_entry into the search / intent / classifier surfaces (kept system-internal for now).

@Hidden-History Hidden-History changed the title Add aim-sot Source-of-Truth subsystem (Wave 1 — Claude) Add aim-sot Source-of-Truth subsystem Jun 13, 2026
@Hidden-History Hidden-History force-pushed the feat/aim-sot-wave1 branch 2 times, most recently from 5d3e6ad to 79ded32 Compare June 15, 2026 02:41
WB Solutions and others added 27 commits June 14, 2026 21:26
First build step of the aim-sot source-of-truth subsystem: the registry
contract that later steps and every user's registry depend on.

- schema/registry.schema.json (JSON Schema draft-07) for .sot/registry.yaml:
  6 required core fields (id, kind, boundary_type, sot_location, owner,
  description), kind/boundary_type/status enums, provenance and pointer-link
  fields. additionalProperties:false at document and entry level enforces the
  no-copy / no-machine-field invariants on the committed registry.
- templates/: user-facing registry.yaml + README starters; no project data,
  illustrative examples only.
- SKILL.md stub; consult/detect-propose/verify modes land in later steps.
- 23 contract tests: schema structure, enum/required validation, instance-level
  rejection of unknown fields, and template validity.
Second build step. Adds the consult engine over .sot/registry.yaml.

- Subcommands list/get/where/who/drift; --json and --registry flags.
- Registry resolution: --registry override -> git-root -> parent-walk
  (non-git fallback), for portability across project layouts.
- Read-only: reads the registry only; light structural validation
  (YAML parse + entries-is-list). Full schema validation is deferred to
  verify mode; resilient to a malformed sibling entry.
- _load_entries() is the single seam for the Item-3 derived memory cache.
- run-with-env.sh (Pattern B) invocation documented in SKILL.md.
- 48 tests incl. git-root/parent-walk resolution and a full main()-level
  read-only guarantee.
…very

Move test_registry_schema.py and test_sot_consult.py from co-located _ai-memory/skills/aim-sot/tests/ (invisible to CI's pytest tests/, testpaths=["tests"]) to top-level tests/, matching the repo convention. Convert the consult test import to the importlib.util.spec_from_file_location pattern; repoint the schema test data paths. Behavior-preserving, no test-logic changes; 48 tests pass.
Wave-1 Item 3. Adds the detect-propose mode: hybrid auto-discovery (manifest->component, top-dir->path, ADR->concern; semantic fields human-owned), the four-type drift taxonomy (location, temporal-staleness, content-hash, declaration-vs-reality/K1, local-hash only), and propose-only output (root-cause + impact; never writes .sot/registry.yaml). Adds a per-install drift cache (atomic write + advisory lock + 7-day TTL throttle, bypassed on a registry edit while preserving hash baselines) and a derived memory cache that reindexes registry entries into the conventions collection (MemoryType.SOT_ENTRY); consult reads through it with a graceful file fallback. Stateless first-run candidate cap. Adds MemoryType.SOT_ENTRY and updates dependent type-count/coverage assertions. Unit tests cover each drift type, the never-writes invariant, cache atomicity/lock/TTL, and reindex determinism.
Add aim_sot_verify.py implementing the BP-024 verification taxonomy
(schema, referential, completeness, content checks) over .sot/registry.yaml,
emitting a structured PASS/CONDITIONAL/FAIL verdict. Structural checks are
driven from registry.schema.json. URL resolution and drift-check execution
are opt-in; owner validation uses CODEOWNERS when present. Adds the verify
mode entry in SKILL.md and a hermetic test suite.
Propose-only end-of-turn hook that invokes the detect-propose engine and surfaces drift/new-candidate counts to stderr; ships unregistered (opt-in). Also corrects the SKILL.md invocation blocks to pass full script paths to run-with-env.sh.
…ve 2)

Per-CLI propose-only trigger adapters that invoke the aim-sot detect-propose
engine on each CLI's end-of-session event, replicating the Claude Stop hook:

  - codex:  Stop       (src/memory/adapters/codex/sot_drift.py)
  - cursor: PreCompact  (src/memory/adapters/cursor/sot_drift.py)
  - gemini: PreCompact  (native PreCompress; src/memory/adapters/gemini/sot_drift.py)

Each adapter normalizes its native stdin via the existing per-CLI schema,
gates on the presence of .sot/registry.yaml (opt-in), and runs the engine in
propose-only mode via run-with-env.sh. Fail-open throughout; never writes any
committed file. Ships unregistered per the portability rule. 30 hermetic tests.
Add manual-enablement instructions for the Codex/Cursor/Gemini sot_drift
trigger adapters (they ship unregistered per the portability rule), mirroring
the Claude Stop hook opt-in section, with each CLI's exact hook-config format.

Also reword the Modes section to drop internal build-sequencing references
that don't belong in user-facing skill content.
Harden the aim-sot detect-propose engine so an unconfirmed drift can no
longer advance the baseline it is meant to flag, and bound its filesystem
and store interactions.

Baseline / throttle (5a cache):
- Stat the artifact (mtime/size) before honouring the 7-day TTL skip and
  re-check when it changed, closing the window where an artifact-only edit
  was invisible for up to a week.
- Hold last_verified_sha on detected drift instead of advancing it, so the
  proposal re-fires until resolved; advance only when the entry is clean or
  the registry's human last_verified is newer (a re-confirmation). A
  first-sighting entry records drift_status="unverified", not "clean".

Reindex (5b cache):
- Load the registry and construct storage before deleting existing points
  (prepare-then-replace), and keep existing points on a transiently-empty or
  unparseable registry. Return an explicit success/failure result and advance
  registry_sha only on success. Serialize the reindex per project_id.

Discovery / safety:
- Derive the project root only from a conforming <root>/.sot/registry.yaml
  and skip discovery for a flat registry path, preventing an unbounded scan.
- Prune skipped directories during traversal and cap the walk; hash files in
  chunks; fsync before replace and clean up the temp file on a failed write.

Companion tests cover edit-within-TTL detection, baseline-hold then re-fire
then human-reconfirm re-baseline, cold-start unverified, reindex
failure/empty preservation and idempotency, lock release on error, and the
flat-registry no-scan guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
K1 previously treated a missing or unreadable 5a drift baseline as a silent
PASS, so a registry whose content was never machine-confirmed could read as
verified. K1 now emits a skipped-no-baseline CONDITIONAL warning ("manual human
confirmation required") for cold-start (no cache record / drift_status=
"unverified"), baseline-loss, and resolution-failure (project_id unresolvable
while a drift cache exists), distinguishing genuine cold-start from baseline
loss in the message. A resolution failure also prints a stderr warning, and a
new --project-id flag lets CI or teammates supply the cache key explicitly.

The verdict now reports ran-pass / no-op (R3/C2/C4) / skipped-no-baseline
distinctly; pass_count counts only ran-pass checks so a flat count can no
longer masquerade as full content verification.

K4 now accepts committed locations seeded in proposal mode (symmetric with S2's
id seeding), so a proposed sot_location colliding with a committed entry's
location fails the gate.

Adds tests for cold-start / unverified / baseline-loss / resolution-failure K1
outcomes, the --project-id override, the distinct verdict buckets, the
proposal-vs-committed K4 collision, and the --exec-drift-checks paths
(non-zero exit, timeout, OSError) all resolving to CONDITIONAL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
General/cascading conventions recall applied no type filter on UNKNOWN intent
because get_target_types(UNKNOWN) returns [] → effective_types=None. After a
project runs reindex, sot_entry JSON blobs surfaced in ordinary rule/guideline
recall and per-turn injection.

Add a must_not guard in MemorySearch.search(): when searching the conventions
collection without an explicit memory_type and without a caller-provided
must_not_types, inject sot_entry into must_not_conditions at the Qdrant layer.

Explicit memory_type=["sot_entry"] (the aim-sot consult path) bypasses the
guard and continues to retrieve sot_entry records as before.
…ters

Cursor: change hook event from preCompact to stop (the actual end-of-turn
event per spec §4 and _CURSOR_HOOK_MAP). Update SKILL.md hooks.json snippet
and adapter docstring accordingly.

Gemini: add AfterAgent→Stop mapping to _GEMINI_HOOK_MAP (was absent); change
adapter from PreCompress to AfterAgent. Fix resolve_cwd to prefer
GEMINI_PROJECT_DIR env var before falling through to stdin cwd and GEMINI_CWD
(BP-032 §Finding 1). Update SKILL.md hooks.json snippet.

Codex: no event change needed (Stop was already correct). Add loop-guard
documentation to all three Wave-2 adapters (no known CLI-level loop-active
field; propose-only design is structurally loop-free per BP-032 §Finding 4).

Tests: add per-adapter event-contract assertions (T-CA11, T-GA11, T-CD11)
confirming native event → expected canonical name → validates against
VALID_HOOK_EVENTS. Add GEMINI_PROJECT_DIR precedence test (T-GA12).
Document the M3 DD-D verdict outcome buckets: ran_pass (substantive checks
that passed), no_op (inert checks: R3/C2/C4), and skipped (checks that
could not run due to missing drift baseline). pass_count now reflects
ran_pass only — inert and skipped checks are not counted as passed, so a
human reading pass_count knows exactly how many checks substantively verified
content.

Update K1 taxonomy description and soft-check ruling: K1 on no baseline
(cold-start, baseline-loss, project-id resolution failure) now emits a
skipped_no_baseline CONDITIONAL warning rather than silently passing as
if content were verified.
User-facing documentation for the aim-sot feature: a [Unreleased]
CHANGELOG entry and a new docs/AIM-SOT.md covering the registry model
and schema, the three skill modes (consult / detect-propose / verify),
the propose-only + HITL verify-gate guarantee, opt-in-OFF-by-default
enablement instructions for all four CLIs, and the 5a/5b runtime caches.
Behavior described reflects the post-cycle-1-fix state: propose-only,
fail-open triggers, cold-start unverified, baseline held on drift.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…red cache

Fix two data-safety defects in the 5b reindex path of the SOT detect-propose
engine, plus documentation and discovery-scan precision:

- An unquoted registry last_verified (e.g. 2026-06-01) parses as a YAML-native
  datetime.date and lands in the reindex payload. json.dumps raised TypeError on
  it, failing the entire reindex so registry_sha never advanced and the rebuild
  retried-and-failed every run (also disabling the throttle via force_recheck).
  Serialize with default=str so dates become isoformat strings.

- When existing points were deleted but every store_memory raised, the reindex
  returned success with stored=0, letting the caller advance registry_sha over an
  emptied-not-restored cache with no retry. Return failure when the prepared set
  is non-empty but nothing was stored.

- Document that cold-start unverified is an intentionally-transient in-session
  marker; tighten the human-reconfirm heal-condition comment to state the strict
  day-granular comparison; emit a stderr note when discovery truncates at the
  directory cap; correct the empty-registry branch comment.

Adds companion tests for each fix (date serialization, delete-then-all-stores-
fail, drift-cache temp cleanup on dump error, reindex-lock release on exception).
FV-1: cond_checks now built first from non-skipped_no_baseline warnings so a
mixed-baseline registry (hash-drift K1 + unverified K1 in same run) correctly
lands K1 in cond, not skipped (DD-D). Companion test confirms FAILs on old logic.

V-LOW-1 (message-only): K1 cold-start vs baseline-loss label for the
project_id_resolved path now uses bool(components) — this project's own cache —
instead of the global _drift_state_populated() glob, preventing multi-project
mislabeling. Companion test and updated test_K1_baseline_loss_message.

V-LOW-2: _sha256_short returning None (file exists but unreadable) now emits a
K1 CONDITIONAL warning instead of silently passing. Companion test added.

FV-2: add test_K1_empty_sha_conditional (comp present, last_verified_sha="" →
CONDITIONAL "no recorded content hash" — existing behavior, test was missing).

FV-3/FV-4 (NITs): update test_verdict_conditional docstring; add dead-code comment
to _build_verdict skipped_checks - fail_checks step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…recall

Use `not memory_types` instead of `memory_types is None` so a caller passing
`memory_type=[]` also triggers the sot_entry exclusion (defense-in-depth;
unreachable today as upstream coerces []→None, but blocks future regressions).

Strengthen test_explicit_must_not_types_caller_override_respected to assert
sot_entry absent from caller-supplied exclusion list. Add
test_empty_list_memory_type_conventions_excludes_sot_entry for the [] case.
Note test_rule_and_guideline_types_unaffected as representative for port/structure
variants.
…ts (cycle-2)

- F-ADP-1: replace wrong verify JSON example (ran_pass=12, R1/R4 in ran_pass,
  fail_count:0 vs failures:[R1]) with a verified clean-run shape: verdict=PASS,
  ran_pass=13 (incl. R2), pass_count=13, failures/warnings empty.
- F-ADP-2: rewrite T-CA11 to spy on normalize_cursor_event via adapter.main(),
  asserting the adapter wires event_name='stop'; fails on preCompact reversion.
- F-ADP-3: make three Gemini tests hermetic — strip GEMINI_PROJECT_DIR + GEMINI_CWD
  so tests pass regardless of whether those env vars are set in the caller's env.
- F-ADP-4: align adapter module docstrings with exact SKILL.md section headings
  (Codex/Cursor/Gemini § references).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A flat --registry path (not under <root>/.sot/) makes
_project_root_from_registry return None, which reached the path-resolving
checks (R1/R4/C3/discovery/K1) unguarded and raised TypeError. A validation
gate must emit a verdict, not traceback.

Resolve a fallback root from the registry's own directory when the project
root is None (mirroring the detect-propose guard) and pass it to the check
fan-out, so every consumer receives a usable root. Add a regression test
asserting the flat-registry path emits a structured FAIL rather than crashing.
…jection

The 5b reindex stored derived sot_entry rows with
source_hook="aim_sot_detect_propose", but the core payload allow-list
(src/memory/validation.py valid_hooks) had no SOT value, so every write was
rejected by validation and the cache stayed empty. store_memory raises
ValueError("Validation failed: ...") on rejection; the reindex inner loop
swallowed it via a bare except, so a 0-row rebuild was misreported by
cmd_reindex as "store unreachable" (exit 0) — sending operators after a phantom
connectivity issue.

- Add "aim_sot_detect_propose" to the core valid_hooks allow-list.
- Distinguish validation rejection from store-unreachable: ReindexResult gains a
  reason field; the inner loop counts ValueError("Validation failed") rejections;
  cmd_reindex reports the rejection accurately and exits non-zero, while a
  genuinely unreachable store still exits zero and leaves the cache intact.

Regression tests hit the real memory.validation.validate_payload end-to-end (no
mocked store): the SOT source_hook is accepted, a bad hook is still rejected, a
row persists through the real allow-list via the reindex path, and the
validation-rejection path reports the correct reason and a non-zero exit.
detect-propose run bailed at a "No registry found" early-return before the
discovery scan whenever no .sot/registry.yaml existed, with a circular message
("Run aim-sot detect-propose to create one" — you are running it). A project
could not be bootstrapped from zero even though the discovery engine works.

- Add _run_cold_start_discovery: when no registry exists, run the discovery scan
  and emit candidate proposals to stdout, rooted at the conforming project root
  or the current working directory (flat/absent path). Kept separate from the
  registry-present drift path (no drift detection, 5b reindex, or 5a cache).
- Stay propose-only: the registry is never created or written; the empty-state
  message now points to the bootstrap steps instead of being circular.
- Document the first-run/bootstrap flow in SKILL.md and correct the stale
  registry template header about the propose -> verify -> approve gate.

Regression tests: cold-start emits discovery candidates via the cwd fallback and
creates no .sot/registry.yaml; the empty-state message is a bootstrap hint, not
the circular bail.
consult added --json and --registry on the top-level parser, so the
documented form (consult list --json) failed with 'unrecognized
arguments' — only the pre-subcommand form (consult --json list) worked.
detect-propose and verify both accept these flags after the subcommand.

Move both flags onto the subcommands via a shared parents=[common]
parser so consult list --json / consult get <id> --registry PATH work,
consistent with the sibling scripts and the SKILL.md example. Existing
test invocations updated to the post-subcommand ordering; regression
tests (C24-C27) cover the post-subcommand forms and assert the
pre-subcommand ordering is now rejected.
…low-up)

A flat --registry override makes _project_root_from_registry return None, so
verify resolved declared locations against the registry's own directory (the
DEFECT-3 fix). But _run_all_checks still ran C1 auto-discovery unconditionally
against that directory, firing spurious "discovered component(s) not registered"
warnings. Thread a discover flag (discover=project_root is not None) so verify
skips discovery for a flat root, matching detect-propose's M5 skip-discovery
contract; path resolution for R1/R4/C3/K1 is unchanged. Add the missing DEFECT-3
CHANGELOG entry.

Tests: test_flat_registry_override_skips_discovery (no spurious C1) and
test_conforming_registry_still_discovers_C1 (conforming path still discovers).
Add the authoring loop that categorizes a project's boundaries by type
and writes gradeable source-of-truth entry descriptions:

- SKILL.md ## Authoring: categorize -> per-type checklist -> describe ->
  D1-D4 grade -> coverage + schema-bind pre-emit gates -> emit
- references/authoring-guide.md: per-type categorization signals,
  canonical-parts checklists, the D1-D4 description rubric, the
  registry-entry emit template (kind/boundary_type enums inline), and a
  persistence + compute-tier prompt for multi-service stacks
- references/grading-exemplars.md: contrastive PASS/WEAK/FAIL examples
- tests/test_gc08_authoring_guidance.py: structural contract test,
  including an all-enum kind-token validity guard

Guidance only; no engine or schema changes.
Add a 'digest' subcommand to aim_sot_consult.py that renders the
registry as a compact, line-capped summary: one line per component
(id, kind, owner, sot_location) plus a drift rollup (clean / N stale /
unverified). Truncates to a count + pointer beyond 200 lines; emits
nothing for an empty registry. Reads the derived entry index via the
existing read-through path; read-only, registry never written.

Add tests/test_sot_consult_digest.py (19 hermetic tests) covering the
render format, the three-way drift rollup across the full drift_status
enum, line-cap truncation, empty/no-registry handling, JSON output, and
a static never-writes scan.
Add standalone, opt-in session-start hooks for Claude, Codex, Cursor, and
Gemini that inject the SOT registry digest as ambient context at session
start. Each hook invokes the consult digest mode (read-only) and emits the
summary into its CLI's session-start channel; all are fail-open (a hook
failure never blocks the session) and gated on registry presence (no
registry = not opted in). The hooks ship unregistered — per-CLI opt-in
instructions are documented in the skill. Add hermetic tests for all four.
Add a Lifecycle section to the aim-sot skill documenting the create
(bootstrap) and update flows end-to-end — discover, author, verify, apply
— with the cross-cutting invariants (propose-only, verify gates apply,
human applies, never auto-rewrite). Point the bootstrap step at the
Authoring guidance instead of leaving the description fields to be written
by hand. Documentation only; no engine change.
WB Solutions added 4 commits June 14, 2026 21:26
…ross all CLIs

Register the SOT ambient-digest (session-start) and drift hooks into the installer's project-level config writers for Claude, Gemini, Cursor, and Codex. Gated by AI_MEMORY_SOT_HOOKS (default on; set to off to skip registration). Hooks self-guard on .sot/registry.yaml presence, so projects without a registry are unaffected.

- Claude: generate_settings.generate_hook_config (digest->SessionStart resume|compact, drift->Stop); merge_settings reuses it, so existing installs upgrade idempotently (dedup-safe, dead-hook-sweep-safe).
- Gemini/Cursor/Codex: write_*_config dict entries with each CLI's existing session-start + drift event and matcher shape.
- Tests: per-CLI registration, gate-off suppression, re-merge idempotency, dead-hook strip-survival, and langfuse+sot Stop combination.
…on/endpoint SOT row

Lifecycle Create step 3: a fresh-registry verify is CONDITIONAL with 0
failures, not PASS — cold-start K1 skipped_no_baseline and advisory (C1)
warnings are expected and do not block apply; literal PASS requires an
established drift baseline. The prior 'all 16 checks must pass' was
unachievable on the bootstrap the step describes.

authoring-guide 2.3 Service/API: add 'API version + served endpoint'
canonical part (api/concern; openapi info.version + servers[]) so the
published contract revision and base URL are declared as a source of truth.
test_idempotency_no_duplicates hard-coded the pre-feature counts (1
SessionStart group, 1 Stop group). With the SOT digest (SessionStart) and
drift (Stop) hooks registered by default (AI_MEMORY_SOT_HOOKS on), the
merged config has 2 SessionStart groups and 2 Stop groups; the merge stays
idempotent across a double run (dedup by command). Update the two count
assertions to match; per-group hook counts are unchanged.
…exclusion

- Rewrite Stop Hook §, Digest Session-Start Hook §, and Trigger Hooks §
  in SKILL.md: hooks are registered by default on install (alongside
  core ai-memory hooks). Reconcile BP-032 explicitly: portability concern
  is bounded — hooks self-guard on .sot/registry.yaml presence (no-op in
  non-SOT projects) and the settings merge is backup + atomic + non-clobbering.
  Supersedes stale "opt-in/never auto-registers" wording.
- Add MED-1 note: AI_MEMORY_SOT_HOOKS=off prevents adding hooks on install
  but does not remove already-registered hooks (settings merge is
  append/base-wins). Document opt-out token: only "off" (case-insensitive)
  disables; false/0/no leave hooks on.
- Update all 9 hook/adapter module docstrings to match default-on policy.
- Harden sot_entry auto-exclusion in search.py (H1 guard): append
  sot_entry to must_not_types unconditionally for untyped conventions
  queries (union), so a caller passing an unrelated must_not_types cannot
  resurface sot_entry rows. Explicit memory_type=["sot_entry"] consult
  path is unaffected.
- Update test_explicit_must_not_types_caller_gets_sot_entry_appended to
  assert the new union behavior.
WB Solutions added 6 commits June 15, 2026 02:47
…fixes

consult read the 5b derived memory cache first and returned it whenever
non-empty, with no binding to the committed .sot/registry.yaml. After any
registry edit, or when another project's rows shared the group_id, it
shadowed the file with stale or cross-state entries. consult now uses the
cache only when the 5a drift cache's registry_sha (advanced solely on a
successful reindex) matches the committed file's SHA; otherwise it falls
back to the committed file. consult stays strictly read-only.

Also:
- verify --proposal flags a proposal lacking an 'entries' key instead of
  silently verifying it as an empty set.
- detect-propose run prunes drift-cache records for components no longer in
  the committed registry (parity with the 5b reindex prune).
- SKILL.md 5b-cache field name corrected to type=sot_entry to match the code.

Tests: realistic-registry regression for the consult staleness fix (fails
without the gate), proposal wrong-shape coverage, and drift-cache orphan
prune coverage.
aim-sot's engine ships under _ai-memory/skills/ with no full .claude/skills/
copy, so installed projects could not discover it as a skill even though its
SOT session and drift hooks were registered. deploy_ai_memory_skills now
generates a thin discovery shim in .claude/skills/ for aim-* skills that live
under _ai-memory/skills/ without a full copy. It runs on every install
(matching aim-sot's always-on SOT hooks), never clobbers a skill that already
has a full copy, and is idempotent on re-install. The aim-* prefix excludes
oversight-internal parzival-save-* skills, which are correctly absent from an
end-user project.

Document the AI_MEMORY_SOT_HOOKS opt-out as a commented line in
docker/.env.example.

Add tests/test_install_aim_sot_skill_surface.py exercising the real
deploy_ai_memory_skills: aim-sot is surfaced as a shim, a full-copy sibling
stays unshimmed, parzival-save-* is not surfaced, and re-install is idempotent.
Lane A (_ai-memory/skills/aim-sot/):
- Guard consult's detect_propose sibling import; when the freshness helpers
  are unavailable, _cache_is_fresh now treats the cache as not fresh so consult
  still serves the committed registry file instead of failing.
- Drop the unreachable default in verify's proposal loader (the missing-entries
  guard already returns), reading data["entries"] directly.
- Rework the A1 consult-freshness regression test to drive the real CLI path
  (consult list --json) and fail on the staleness assertion rather than a
  missing-symbol error when run against the pre-gate code.
- Add an A3 case asserting throttle-skipped components that remain in the
  registry survive the orphan prune; add a consult test covering the
  committed-file fallback when the sibling helpers are unavailable.

Lane B (scripts/install.sh):
- Clarify the shim "no clobber" comment (the skip-guard protects full copies;
  the shim itself is rebuilt every run) and document the shim loop's ordering
  dependency on the canonical-files copy.
- Extend the install idempotency test to change the canonical SKILL.md between
  deploys and assert the regenerated shim reflects the change.
…5b.py

A1 updated _try_memory_cache(registry_path, project_id) signature;
align the three direct call sites in T-CR1/T-CR2/T-CR3 to pass "test-project".
T-CR4 uses patch.object and is unaffected.
…consult freshness

Non-SOT writes now pass exempt_handle_pii=False to scanner; update test assertions
to assert the correct call signature as regression guard.
… _scroll_sot_payloads seam

C4-1: _cache_is_fresh now requires both SHA uniformity and row-count equality with
the committed file's entry count. A partial reindex (stored < total due to
validation rejection) leaves all rows stamped with the committed SHA but row count
short; the old SHA-only check trusted that subset forever. Fix: _load_entries parses
the committed file once when payloads exist, passes len(file_entries) to
_cache_is_fresh, and reuses file_entries as the fallback return value (no second
YAML parse). New test: 2-of-3 rows stamped committed SHA → file fallback, all 3
committed entries returned.

C4-2: F2 refactor made _load_entries bypass _try_memory_cache (calls
_scroll_sot_payloads directly), orphaning both the helper and its tests. Removed
_try_memory_cache; rewired tests/test_sot_consult_5b.py to the _scroll_sot_payloads
seam via monkeypatch (mirrors test_a1_consult_freshness.py). Updated
test_a1_consult_freshness.py::test_fresh_cache_used_preserves_drift_status to
supply all 12 enriched rows (cardinality guard requires full-set cache).

C4-3: Added one-line comment at the store_memories_batch scanner loop noting
exempt_handle_pii is intentionally omitted (SOT uses per-row store_memory; batching
SOT rows would silently re-redact owner handles). Comment only, no logic change.
@Hidden-History Hidden-History merged commit bcdf266 into main Jun 16, 2026
14 checks passed
@Hidden-History Hidden-History deleted the feat/aim-sot-wave1 branch June 16, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant