perf(store): upgrade fjall 2.11→3 + single-pass edge load#44
Merged
Conversation
added 2 commits
June 20, 2026 09:48
…gle pass Upgrade aden-store's fjall dependency from 2.11.2 to 3.1.5. fjall 3 replaces the background monitor thread's unconditional sleep with a flume-channel wake, cutting store open+drop from ~250 ms to ~715 µs. API migration in fjall_store.rs: - Type renames: Keyspace→Database, PartitionHandle→Keyspace, PartitionCreateOptions→KeyspaceCreateOptions - Open: Database::builder(path).open() - Iterator: Guard + .into_inner()? instead of Result<(key,val)> - flush: db.persist(PersistMode::SyncAll) New get_all_edges() on GraphStorage trait (with single-scan override on FjallStorage) replaces bridge.rs's loop over 32 EdgeType variants — 32 full partition scans become 1, measured 20× faster on a seeded store. NOTICE.md updated: fjall 2.11.2→3.1.5, lsm-tree 2.10.4→3.1.5, byteview 0.6.1→0.10.1, lz4_flex 0.11.6→0.13.1; new entries for byteorder-lite, flume, sfa, twox-hash; value-log removed (no longer a transitive dep in fjall 3).
…load Two criterion benchmarks covering the fjall-3 perf improvements: - store_open_drop: measures open+drop of an existing store (was ~250 ms with fjall 2.11; now ~715 µs with fjall 3) - edge_load/single_scan vs edge_load/per_type_loop: single-pass get_all_edges (34 µs) vs old 32-scan loop (690 µs), ~20× gain
5 tasks
RioPlay
added a commit
that referenced
this pull request
Jun 20, 2026
* perf(graph): single-pass edge load in build_from_storage build_from_storage (the `aden gen` graph-build path) still looped over all 32 EdgeType variants, issuing one full scan of the edges partition per variant. PR #44 replaced the same pattern in bridge.rs but missed this second call site. Switch to get_all_edges(), which scans the edges partition once and buckets by the type embedded in each key. Still covers every variant, so Demonstrates/Mentions/DefinesTerm stay included — the build_from_storage_loads_demonstrates_mentions_defines_term guard test passes unchanged. * docs: fix 9 accuracy issues across README, ADRs, and command docs Each correction verified against the code before editing: - README: dual-substrate levers are opt-in (ADEN_LEXICON_ON), not on by default — util.rs:1078 shipped them off after they measured net neutral-to-negative on natural queries over external repos. - architecture.adoc: asm depth default is 2, not 3 (main.rs:224); drop Constrains from the "active edge types" list (no emitter, util.rs:318 / ADR-007 §1). - commands.adoc: --backlinks returns all incoming referencers filtered only by --edge-type (no containment exclusion, query.rs:992); --impact set is [Uses, Calls, Implements, Mutates]; intent table matches edge_types_for_intent (query.rs:1265) — removed emitter-less Constrains/Invokes, added Demonstrates, noted the always-on Calls and Documents->Contains append. - AGENTS.md: ci-check runs before push (the pre-push gate), not commit. - benchmarks.adoc: the ~250 ms fjall 2.11.2 floor was eliminated by the fjall 3 upgrade (PR #44); annotate the pre-upgrade latency table as stale rather than inventing new end-to-end numbers. - adr-003: drop the stale fjall::Config::new API reference (gone in fjall 3) for an API-neutral phrasing. --------- Co-authored-by: RioPlay (Ernest Hamblen) <rioplay@rioplay.dev>
5 tasks
RioPlay
added a commit
that referenced
this pull request
Jun 20, 2026
…009) (#49) * docs(adr): ADR-009 store format change recovery (proposed) Design note for the fjall-3 migration hazard found while testing the post-#44 release: no graceful recovery on a store-format mismatch (InvalidVersion), and freshness decoupled from the store so the obvious recovery re-gens to a silently empty graph. Proposes a typed IncompatibleVersion error, version-mismatch-only auto-rebuild on the create path, and freshness-follows-store (empty-store guard + gen --rebuild). Status: proposed; LSM-path implementation gated on sign-off. * fix(store): graceful recovery from an incompatible store format The fjall 2.11→3 upgrade (#44) changed the on-disk format with no migration: every existing V2 store hard-errored InvalidVersion on open, so a user's first command after upgrading failed. Deleting just store/ and re-genning made it worse — gen trusted the per-file mtime cache and skipped everything, producing a silently EMPTY graph that status/doctor still called "100/100 healthy". Recovery is now automatic and seamless across every store entry point: - aden-store: a new StoreError::IncompatibleVersion, mapped from fjall's typed Error::InvalidVersion in the From impl. Every OTHER fjall error stays on the generic Io path, so a transient I/O failure can never be mistaken for a recoverable format change. - gen (cmd_gen_inner): on IncompatibleVersion, wipe the (rebuildable, ADR-003) store and rebuild from source with a one-line notice. Scoped strictly to that signal; a generic Io error still hard-fails without wiping. A pinned/shared $ADEN_STORE is never auto-wiped — it surfaces an actionable error instead (mirrors regen's caution). - freshness follows the store: a non-existent/empty store (first gen, rm -rf store/, regen, or a just-recovered store) forces a full re-scan instead of trusting the stale gen-cache, closing the silent-empty trap. - reads (ensure_fresh) and heal (cmd_heal_scan/_since) route through a shared recover_if_incompatible_store helper: probe the open and, on IncompatibleVersion only, rebuild silently — so grep/ask/understand and heal auto-recover instead of degrading to empty results. heal uses the recovery-only helper (NOT a staleness-gen), so a normal heal still surfaces real drift. The proposed `gen --rebuild` flag was dropped: `regen` already provides a discoverable clean rebuild, and the empty-store guard makes a plain `rm -rf store/ && gen` rebuild correctly while preserving the expensive embeddings cache. Tests: aden-store unit test pins the InvalidVersion→IncompatibleVersion mapping (and Io stays Io); an integration test corrupts the on-disk version marker and asserts the typed error surfaces. Validated end-to-end against a real V2 store (gen, reads, and heal all recover; healthy stores still skip incrementally; embeddings survive). ADR-009 accepted. * fix(store): address adversarial-review findings on recovery A multi-dimension review of the recovery change surfaced eight findings; the load-bearing ones are fixed here: - HIGH (recover_if_incompatible_store returned `true` even when the rebuild FAILED): callers (ensure_fresh, heal) treated the store as recovered, skipped their own logic, and degraded to empty results — most concretely the pinned/shared $ADEN_STORE case, whose error was also swallowed. Now returns `true` only on a successful rebuild and prints the failure (e.g. the pinned-store refusal) so it is never silent. - MED (single-file gen lost the graph): `gen <file>` on an incompatible store wiped the whole store but re-indexed only that one file, leaving a near-empty graph. The recovery arm now expands a single-file gen to the full project. - MED (cmd_heal_apply uncovered): the one heal entry point without a recovery guard emitted a misleading "run `aden gen` first" error on an incompatible store. Now guarded like the scan paths. - LOW (test temp-dir leak): the corrupt-version-marker test now cleans up before asserting. Adds hermetic integration tests (CARGO_BIN_EXE_aden + isolated ADEN_DATA_DIR) covering read-path auto-recovery and the gen recovery + empty-store guard end-to-end. --------- Co-authored-by: RioPlay (Ernest Hamblen) <rioplay@rioplay.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two orthogonal performance fixes targeting the 250 ms floor and 32-scan bridge overhead measured in the perf audit:
fjall 2.11 → 3 upgrade (
crates/aden-store/):flume-channel wake. Store open+drop drops from ~250 ms to ~715 µs (bench:store_open_drop).Keyspace→Database,PartitionHandle→Keyspace,PartitionCreateOptions→KeyspaceCreateOptions; iterator updated toGuard+.into_inner()?; flush usesdb.persist(PersistMode::SyncAll).Single-pass edge load (
crates/aden-store/src/lib.rs,crates/aden-graph/src/bridge.rs):bridge.rslooped over all 32EdgeTypevariants, issuing a full scan of the edges partition for each → 32 scans per graph load.get_all_edges()onGraphStorage(overridden inFjallStorage) scans once and buckets by the type embedded in the key.edge_load/single_scanvsedge_load/per_type_loop).Not included (deferred): Fix 2 (single storage handle per process via
OnceLock) — theArc-backedDatabase: Cloneapproach is sound, but threading the handle throughopen_heal_storageandcache.rs's independentStorage::open_existingcall sites requires a separate, larger refactor. Filed as a follow-up.Test plan
cargo test --workspacegreen (203 tests pass, 0 fail)cargo clippy --workspaceclean (0 warnings)cargo fmt --allappliedLicense checklist (required for any Cargo.toml change)
cargo deny checkpassesaden licensesoutput againstNOTICE.mddep table — no new entries missing