Skip to content

perf(store): upgrade fjall 2.11→3 + single-pass edge load#44

Merged
RioPlay merged 2 commits into
mainfrom
perf/fjall3-upgrade
Jun 20, 2026
Merged

perf(store): upgrade fjall 2.11→3 + single-pass edge load#44
RioPlay merged 2 commits into
mainfrom
perf/fjall3-upgrade

Conversation

@RioPlay

@RioPlay RioPlay commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Summary

Two orthogonal performance fixes targeting the 250 ms floor and 32-scan bridge overhead measured in the perf audit:

fjall 2.11 → 3 upgrade (crates/aden-store/):

  • fjall 3 replaces the background monitor thread's unconditional sleep with a flume-channel wake. Store open+drop drops from ~250 ms to ~715 µs (bench: store_open_drop).
  • API migration: KeyspaceDatabase, PartitionHandleKeyspace, PartitionCreateOptionsKeyspaceCreateOptions; iterator updated to Guard + .into_inner()?; flush uses db.persist(PersistMode::SyncAll).

Single-pass edge load (crates/aden-store/src/lib.rs, crates/aden-graph/src/bridge.rs):

  • Old bridge.rs looped over all 32 EdgeType variants, issuing a full scan of the edges partition for each → 32 scans per graph load.
  • New get_all_edges() on GraphStorage (overridden in FjallStorage) scans once and buckets by the type embedded in the key.
  • Measured 20× speedup: 34 µs vs 690 µs on a seeded store (bench: edge_load/single_scan vs edge_load/per_type_loop).

Not included (deferred): Fix 2 (single storage handle per process via OnceLock) — the Arc-backed Database: Clone approach is sound, but threading the handle through open_heal_storage and cache.rs's independent Storage::open_existing call sites requires a separate, larger refactor. Filed as a follow-up.

Test plan

  • cargo test --workspace green (203 tests pass, 0 fail)
  • cargo clippy --workspace clean (0 warnings)
  • cargo fmt --all applied

License checklist (required for any Cargo.toml change)

  • cargo deny check passes
  • Compared aden licenses output against NOTICE.md dep table — no new entries missing
    • Updated: fjall 2.11.2→3.1.5, lsm-tree 2.10.4→3.1.5, byteview 0.6.1→0.10.1, lz4_flex 0.11.6→0.13.1
    • Added: byteorder-lite 0.1.0 (Unlicense OR MIT), flume 0.12.0 (Apache-2.0/MIT), sfa 1.0.0 (MIT OR Apache-2.0), twox-hash 2.1.2 (MIT)
    • Removed: value-log 1.9.0 (no longer a transitive dep in fjall 3)
    • spin 0.9.8 was already present in NOTICE.md
  • No new MPL/LGPL/GPL deps
  • No compiled-in MIT deps with named copyright holders requiring new entries
  • No dense-feature-only deps added
  • No vendored JS assets changed

RioPlay (Ernest Hamblen) added 2 commits June 20, 2026 09:48
…gle pass

Upgrade aden-store's fjall dependency from 2.11.2 to 3.1.5.  fjall 3
replaces the background monitor thread's unconditional sleep with a
flume-channel wake, cutting store open+drop from ~250 ms to ~715 µs.

API migration in fjall_store.rs:
- Type renames: Keyspace→Database, PartitionHandle→Keyspace,
  PartitionCreateOptions→KeyspaceCreateOptions
- Open: Database::builder(path).open()
- Iterator: Guard + .into_inner()? instead of Result<(key,val)>
- flush: db.persist(PersistMode::SyncAll)

New get_all_edges() on GraphStorage trait (with single-scan override on
FjallStorage) replaces bridge.rs's loop over 32 EdgeType variants — 32
full partition scans become 1, measured 20× faster on a seeded store.

NOTICE.md updated: fjall 2.11.2→3.1.5, lsm-tree 2.10.4→3.1.5,
byteview 0.6.1→0.10.1, lz4_flex 0.11.6→0.13.1; new entries for
byteorder-lite, flume, sfa, twox-hash; value-log removed (no longer
a transitive dep in fjall 3).
…load

Two criterion benchmarks covering the fjall-3 perf improvements:
- store_open_drop: measures open+drop of an existing store (was ~250 ms
  with fjall 2.11; now ~715 µs with fjall 3)
- edge_load/single_scan vs edge_load/per_type_loop: single-pass
  get_all_edges (34 µs) vs old 32-scan loop (690 µs), ~20× gain
@RioPlay RioPlay merged commit 7ed4fbf into main Jun 20, 2026
6 checks passed
@RioPlay RioPlay deleted the perf/fjall3-upgrade branch June 20, 2026 15:06
RioPlay added a commit that referenced this pull request Jun 20, 2026
* perf(graph): single-pass edge load in build_from_storage

build_from_storage (the `aden gen` graph-build path) still looped over
all 32 EdgeType variants, issuing one full scan of the edges partition
per variant.  PR #44 replaced the same pattern in bridge.rs but missed
this second call site.

Switch to get_all_edges(), which scans the edges partition once and
buckets by the type embedded in each key.  Still covers every variant,
so Demonstrates/Mentions/DefinesTerm stay included — the
build_from_storage_loads_demonstrates_mentions_defines_term guard test
passes unchanged.

* docs: fix 9 accuracy issues across README, ADRs, and command docs

Each correction verified against the code before editing:

- README: dual-substrate levers are opt-in (ADEN_LEXICON_ON), not on by
  default — util.rs:1078 shipped them off after they measured net
  neutral-to-negative on natural queries over external repos.
- architecture.adoc: asm depth default is 2, not 3 (main.rs:224);
  drop Constrains from the "active edge types" list (no emitter,
  util.rs:318 / ADR-007 §1).
- commands.adoc: --backlinks returns all incoming referencers filtered
  only by --edge-type (no containment exclusion, query.rs:992); --impact
  set is [Uses, Calls, Implements, Mutates]; intent table matches
  edge_types_for_intent (query.rs:1265) — removed emitter-less
  Constrains/Invokes, added Demonstrates, noted the always-on Calls and
  Documents->Contains append.
- AGENTS.md: ci-check runs before push (the pre-push gate), not commit.
- benchmarks.adoc: the ~250 ms fjall 2.11.2 floor was eliminated by the
  fjall 3 upgrade (PR #44); annotate the pre-upgrade latency table as
  stale rather than inventing new end-to-end numbers.
- adr-003: drop the stale fjall::Config::new API reference (gone in
  fjall 3) for an API-neutral phrasing.

---------

Co-authored-by: RioPlay (Ernest Hamblen) <rioplay@rioplay.dev>
RioPlay added a commit that referenced this pull request Jun 20, 2026
…009) (#49)

* docs(adr): ADR-009 store format change recovery (proposed)

Design note for the fjall-3 migration hazard found while testing the
post-#44 release: no graceful recovery on a store-format mismatch
(InvalidVersion), and freshness decoupled from the store so the obvious
recovery re-gens to a silently empty graph. Proposes a typed
IncompatibleVersion error, version-mismatch-only auto-rebuild on the
create path, and freshness-follows-store (empty-store guard + gen
--rebuild). Status: proposed; LSM-path implementation gated on sign-off.

* fix(store): graceful recovery from an incompatible store format

The fjall 2.11→3 upgrade (#44) changed the on-disk format with no
migration: every existing V2 store hard-errored InvalidVersion on open,
so a user's first command after upgrading failed. Deleting just store/
and re-genning made it worse — gen trusted the per-file mtime cache and
skipped everything, producing a silently EMPTY graph that status/doctor
still called "100/100 healthy".

Recovery is now automatic and seamless across every store entry point:

- aden-store: a new StoreError::IncompatibleVersion, mapped from fjall's
  typed Error::InvalidVersion in the From impl. Every OTHER fjall error
  stays on the generic Io path, so a transient I/O failure can never be
  mistaken for a recoverable format change.
- gen (cmd_gen_inner): on IncompatibleVersion, wipe the (rebuildable,
  ADR-003) store and rebuild from source with a one-line notice. Scoped
  strictly to that signal; a generic Io error still hard-fails without
  wiping. A pinned/shared $ADEN_STORE is never auto-wiped — it surfaces
  an actionable error instead (mirrors regen's caution).
- freshness follows the store: a non-existent/empty store (first gen,
  rm -rf store/, regen, or a just-recovered store) forces a full
  re-scan instead of trusting the stale gen-cache, closing the
  silent-empty trap.
- reads (ensure_fresh) and heal (cmd_heal_scan/_since) route through a
  shared recover_if_incompatible_store helper: probe the open and, on
  IncompatibleVersion only, rebuild silently — so grep/ask/understand
  and heal auto-recover instead of degrading to empty results. heal
  uses the recovery-only helper (NOT a staleness-gen), so a normal heal
  still surfaces real drift.

The proposed `gen --rebuild` flag was dropped: `regen` already provides a
discoverable clean rebuild, and the empty-store guard makes a plain
`rm -rf store/ && gen` rebuild correctly while preserving the expensive
embeddings cache.

Tests: aden-store unit test pins the InvalidVersion→IncompatibleVersion
mapping (and Io stays Io); an integration test corrupts the on-disk
version marker and asserts the typed error surfaces. Validated
end-to-end against a real V2 store (gen, reads, and heal all recover;
healthy stores still skip incrementally; embeddings survive). ADR-009
accepted.

* fix(store): address adversarial-review findings on recovery

A multi-dimension review of the recovery change surfaced eight findings;
the load-bearing ones are fixed here:

- HIGH (recover_if_incompatible_store returned `true` even when the
  rebuild FAILED): callers (ensure_fresh, heal) treated the store as
  recovered, skipped their own logic, and degraded to empty results — most
  concretely the pinned/shared $ADEN_STORE case, whose error was also
  swallowed. Now returns `true` only on a successful rebuild and prints the
  failure (e.g. the pinned-store refusal) so it is never silent.
- MED (single-file gen lost the graph): `gen <file>` on an incompatible
  store wiped the whole store but re-indexed only that one file, leaving a
  near-empty graph. The recovery arm now expands a single-file gen to the
  full project.
- MED (cmd_heal_apply uncovered): the one heal entry point without a
  recovery guard emitted a misleading "run `aden gen` first" error on an
  incompatible store. Now guarded like the scan paths.
- LOW (test temp-dir leak): the corrupt-version-marker test now cleans up
  before asserting.

Adds hermetic integration tests (CARGO_BIN_EXE_aden + isolated
ADEN_DATA_DIR) covering read-path auto-recovery and the gen recovery +
empty-store guard end-to-end.

---------

Co-authored-by: RioPlay (Ernest Hamblen) <rioplay@rioplay.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant