Skip to content

fix(search): guard dual-substrate lexicon levers (opt-in) + index .asc AsciiDoc#40

Merged
RioPlay merged 3 commits into
mainfrom
fix/lexicon-additive-guard
Jun 20, 2026
Merged

fix(search): guard dual-substrate lexicon levers (opt-in) + index .asc AsciiDoc#40
RioPlay merged 3 commits into
mainfrom
fix/lexicon-additive-guard

Conversation

@RioPlay

@RioPlay RioPlay commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Summary

The dual-substrate lexicon levers (PR #38/#39) shipped on by default, but external benchmarking showed they regress retrieval on every external repo with natural queries — e.g. prose MRR 0.336→0.166, tanstack 0.104→0.007 on the same base. The original wins were real but confined to engineered zero-overlap probes and a dense base, which the product path doesn't match. This PR makes the levers safe + opt-in, and fixes a related indexing bug found while building the prose benchmark.

fix(search) — guard the levers, flip to opt-in. query_index now always computes the baseline ranking as a safety floor; expansion is DF-gated (drops common-word noise) and fused base-weighted (can supplement but never evict a confident base hit); the PPMI rerank is gated to code-shaped queries. Levers are now opt-in via ADEN_LEXICON_ON — the default is the plain baseline (the best of everything measured). Guarded, enabling them reaches parity instead of cratering.

fix(index) — index .asc AsciiDoc. .asc was in SECRET_EXTS (PGP-armor collision), so AsciiDoc books that use it (Pro Git, many docs repos) indexed empty with a green health score. It's dual-use, so it's now content-gated: a real armored PRIVATE key in a .asc is still caught by content_has_high_confidence_secret; prose indexes normally.

test(bench) — adds scripts/lexicon_ab_bench.py (external ON/OFF A/B harness) + an external prose query set, so any future "on by default" decision is gated on real repos, not in-tree fixtures.

Test plan

  • cargo test --workspace green (via the pre-push ci-check gate)
  • cargo clippy --workspace clean
  • cargo fmt --all applied

License checklist

  • No dependency changes — license checklist not applicable

RioPlay (Ernest Hamblen) added 3 commits June 19, 2026 16:14
The lexicon levers (PR #38/#39) shipped on-by-default but regress retrieval
on external repos with natural queries: neutral-to-negative on
rustfmt/Go/flask/TS/prose (e.g. prose MRR 0.336->0.166, tanstack
0.104->0.007 on the same base). The wins were real but confined to
engineered zero-overlap probes and a dense base.

- query_index always computes the baseline ranking as a safety floor;
  expansion is DF-gated (drops common-word noise) and fused base-weighted
  (supplements but never evicts a confident base hit); rerank is gated to
  code-shaped queries. The levers reach parity instead of cratering.
- levers are now opt-in (ADEN_LEXICON_ON); default is the baseline.
- add scripts/lexicon_ab_bench.py (external A/B harness) and an external
  prose query set so re-enabling by default is gated on real repos.
filter.rs SECRET_EXTS listed "asc", so AsciiDoc books using the .asc
extension (Pro Git and many doc repos) indexed empty with a green health
score. .asc is dual-use (AsciiDoc + PGP armor); judge by content, not
name: drop it from the skip-list and add a PGP-private-key content check
so a real armored key in a .asc is still caught while prose indexes.
The blast-radius eval (graph caller-edges vs a text-scan ground truth of
NAME( call sites, file-level) existed only as a stale .pyc. Reconstruct it
as a committed, reusable harness that points at any external repo and
SEPARATES two concerns:
  - understand name-resolution accuracy (did `understand NAME` land on the
    exact symbol, or fuzzy-match a substring superset), and
  - blast-radius precision/recall on the correctly-resolved symbols.

Auto-discovers curated callable gold symbols (no dunders/generic names,
called-but-not-ubiquitous), excludes test dirs to match aden's production
extraction scope. On flask: resolution 0.95, blast-radius P0.44/R0.61 —
the method-call-resolution gap depresses recall on OO/Python vs the 0.99
on free-function Rust code.
@RioPlay RioPlay merged commit 15339eb into main Jun 20, 2026
6 checks passed
RioPlay added a commit that referenced this pull request Jun 20, 2026
commands.adoc:
- asm --depth default: 3 → 2 (matches code)
- mcp install: add missing --surface <essential|standard|full> flag
- viz: add missing --scope and --resolution flags; remove bogus -j alias
- heal: add structured flag table (--propose/--fix/--gc/--since/--apply/--watch)
- status: document savings estimate output block

ai-integration.adoc:
- Replace two-tier Core/Extended model with three-tier Essential/Standard/Full
- Fix tool assignments: search/list/communities/impact-diff were in wrong tier
- Fix env var: ADEN_MCP_FULL=1 → ADEN_MCP_SURFACE=standard|full (legacy alias noted)
- Document --surface flag at install time

retrieval-levers.adoc:
- Fix polarity: auto-gating is OFF by default, opt-in via ADEN_LEXICON_ON
- Reframe ADEN_LEXICON_OFF as kill switch, not primary disable mechanism
- Fix rerank trigger: code_anchor_fraction → query_looks_codey (query text only)
- Fix NL-over-code behavior: expands only, does not rerank
- Document PR #39/PR #40 revert history in status section

architecture.adoc:
- Add aden-paths node to Mermaid crate diagram
- Remove duplicate aden-mcp Phase 2 row (already shipped as Phase 0)

security-model.adoc:
- Replace CanPerform (not a valid EdgeType) with Invokes in semantics example
- Update malicious-contract-injection threat: moot since store-first (ADR-003)

.agent/quick-ref.adoc:
- Add --features watch caveat to aden watch entry

crates/aden-cli/src/commands/init.rs:
- Fix misleading "Knowledge graph built in .aden/store." message; store is
  in per-user data dir since ADR-003

docs/adr-008-current-implementation-state.adoc (new):
- Document current scope extensions beyond original ADRs: Wave 3 edges
  (Supersedes/Justifies/AssociatedWith), GEN_LOGIC_VERSION=4, store-first
  architecture, MCP three-tier surface, dual-substrate retrieval (opt-in)

Co-authored-by: RioPlay (Ernest Hamblen) <rioplay@rioplay.dev>
@RioPlay RioPlay deleted the fix/lexicon-additive-guard branch June 21, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant