DST harness — generative simulation suite + shared omnigraph-dst crate + cross-backend walk (tests-only; consolidates #303 + #305)#309
DST harness — generative simulation suite + shared omnigraph-dst crate + cross-backend walk (tests-only; consolidates #303 + #305)#309ragnorc wants to merge 17 commits into
Conversation
…, critic findings) Port: brings the full harness (B1-B6) onto current main (0.7.2). Verified all 5 bug guards still reproduce on 0.7.2 — its only fix (#296 recovery roll-forward convergence) addresses a DIFFERENT bug than any the harness catches. Adds test-only dev-deps (arrow-array, futures, serde_json, async-trait, proptest-state-machine). Harden (completeness-critic pass): - MATRIX.md — the coverage ledger: enumerates D1-D5 sampled/unsampled cells, names the HIDDEN dimensions found only after a miss, and explains why exhaustive sampling is impossible (path-spaces / non-enumerable schedules / open-world model). Prioritized gap backlog. - Reopen op — open/recovery sweep is now a first-class walk op (drop + reopen mid-walk), sampled across varied table states, not only at the end. ops 10/10, 0 novel violations. - Critic finding #1 (#296 cell): concurrent open / >=2 recovery sweeps on one sidecar is unsampled — the named blind spot that let #296 through. - Critic finding #2 (fault seam): the FaultAdapter wraps StorageAdapter::write_text_if_match, but the __manifest publish is a Lance MergeInsertBuilder CAS (publisher.rs:377) that never flows through it — verified empirically (cas_pct=100 write succeeds, no sidecar). So Phase-2 'manifest CAS fault injection' is off the hot path; real injection needs failpoints or a Lance object_store wrapper. This is the prerequisite for closing the #296 cell.
The StorageAdapter FaultAdapter is off the manifest-publish path, so it can't induce a recovery sidecar. Failpoints can. New tests/dst_recovery.rs (own binary — the process-global fail registry must not leak into the main, non-serial walks; verified the in-binary version broke 3 parallel tests): - recovery_rolls_forward_under_finalize_failure: mutation.post_finalize_pre_publisher leaves a real RolledPastExpected sidecar; reopen rolls it forward; the harness WHITE-BOX structural battery holds (additive vs the engine's count-only test). - concurrent_opens_converge_on_pending_sidecar (the #296 cell): an inline park-first rendezvous at recovery.before_roll_forward_publish forces two open sweeps to race one sidecar; the CAS-loser must CONVERGE, not fail the open. Non-vacuous (rendezvous panics if the race never fires); guards 0.7.2 (the failpoint was added with the #296 fix, so it can't run on true 0.7.1). MATRIX.md backlog: fault-seam (#1), #296 cell (#2), Reopen op (#3) now DONE; cross-process + generative-via-object_store remain.
… proptest-state-machine)
The #7230 lesson: a parse/merge path only ever fed clean input shipped a latent crash. Fuzz the GQ query parser (find_named_query), the schema parser (parse_schema), and the JSONL loader with arbitrary + structured + adversarial (dup-@key, reordered/extra/missing fields, wrong types, malformed JSON, orphan edges, enum-out-of-range, huge strings) input and assert they return a Result, never panic — and the graph stays usable after the abuse. proptest shrinks+persists any panic. cargo-fuzz/libFuzzer targets deferred (need nightly); this gets the same coverage on stable. All green — no parser panics found.
Runs a short seeded op sequence + the full white-box battery against a real object-store backend (s3://), not just local FS — the D5 S3 context dimension. Gated on OMNIGRAPH_S3_TEST_BUCKET (skips locally; runs in CI's rustfs_integration job), mirroring tests/s3_storage.rs. Known bugs allow-listed, novel -> fail.
…ferred with reason
…arity A DST-flavored op sequence (init -> 2x load --merge multi-fragment -> insert -> edge-free delete -> query) run against the embedded Omnigraph SDK and the omnigraph CLI subprocess on twin local graphs must agree on the final person-slug set. Fills the gap parity_matrix.rs leaves (it does single-verb CLI-local-vs-server; this adds the embedded-SDK arm + a multi-op sequence). Format-tolerant oracle (embedded to_rust_json vs CLI --json differ in wrapping): compare the set of smoke-* slug strings. The CLI wraps the same engine, so this guards the CLI transport layer (arg parse, --json serialization, --store addressing, --as actor) against divergence from embedded. Reuses support::cli(). Green.
…_dst_parity, the dst feature) Per the maintenance contract: add the DST harness to the test-coverage map — engine table rows for dst.rs (generative walks + white-box battery + regressions + statemachine shrinking + readshape + fuzz + S3 cell + replay-equality) and dst_recovery.rs (failpoint recovery / #296 cells); the cli_dst_parity cross-backend smoke in the omnigraph-cli row; a 'Running the DST harness' section (the feature flags + run commands + the test-only zero-cost dst determinism seam); and the S3 cell in the rustfs CI list. Points at dst/MATRIX.md for the coverage ledger.
…(PR-E foundation) New dev-only crate crates/omnigraph-dst (workspace member, NOT a default-member) depending on omnigraph-engine + omnigraph-compiler, wired as a [dev-dependencies] of omnigraph-engine. This forms a dev-dependency cycle (engine dev-dep -> omnigraph-dst -> engine); VERIFIED cargo accepts it (cargo metadata resolves; cargo build -p omnigraph-dst finishes) — the make-or-break gate for extracting op/model/invariants behind a Backend trait so the generative walk can run cross-backend (embedded + CLI). Modules land in the following commits.
…nto omnigraph-dst
The generative walk was hard-coupled to `&Omnigraph` (step called
db.mutate/load_jsonl/db.query directly), so it could only run embedded. Move
the reusable harness primitives into the omnigraph-dst lib behind a black-box
`Backend` trait so the SAME seeded walk can drive any context:
- backend.rs: `Backend` (load/mutate/query/optimize/repair, normalized rows)
+ `BackendError`{Engine(OmniError)|Cli{code,stderr}} + `Embedded` (wraps
Omnigraph, optional FaultAdapter, exposes db() for the white-box battery)
+ `Cli` (subprocess; binary path injected by the consumer).
- op.rs: `step<B: Backend>` (was step(&Omnigraph)); all gq standardized to a
single query named `q` so embedded names it and the CLI infers it from -e.
- model.rs: count/content/edges-traversable==model are now black-box (via
Backend::query, key parsed by suffix to survive embedded-vs-CLI key naming);
check_edges keeps its raw white-box snapshot half (embedded-only).
- invariants.rs: white-box battery unchanged (takes &Omnigraph via db());
classifier split into classify(&Finding) + classify_backend(&BackendError)
over a shared engine-signal core; run_battery takes &Embedded.
- fault.rs: FaultAdapter moved verbatim.
White-box stays embedded-only by construction; a cross-backend walk runs the
black-box oracles. Crate builds clean (zero warnings). Consumers rewired next.
Delete the engine-local copies of op/model/invariants/fault/backend (now in the shared crate) and rewire the consumers to use omnigraph_dst::…: - dst.rs: the generative walk + s3 battery drive Embedded (the Backend impl) with step + run_battery; op errors classify via classify_backend. The named regressions/concurrency/branch scenarios are white-box, so they keep a raw Omnigraph handle via a local open_clean helper. - dst_recovery.rs: drops the 5 #[path] modules (failpoint cells only need the raw handle + omnigraph_dst::invariants); recovery_walk rewired likewise. - coverage/statemachine/fuzz: stay engine-local (drive the handle directly), now import OpKind/op/invariants from omnigraph_dst. Eliminates the #[path]-duplication that compiled those 5 modules twice (once per test binary). Behavior unchanged: --test dst -> 16 passed, all 4 regressions trip, both walks find only known bugs (0 novel), coverage ops 10/10 invariants 7/7; --test dst_recovery (failpoints) compiles clean.
…mnigraph-dst The same seeded DST op stream now drives BOTH the embedded Omnigraph SDK and the omnigraph CLI subprocess (Embedded + Cli Backend impls from omnigraph-dst, binary via CARGO_BIN_EXE_omnigraph), asserting per-step black-box agreement on the Person/Doc slug sets + traversable edge count, plus op-kind and op-success parity. Two lockstep walks (same seed + identical reference Model) stay in sync because both wrap the same engine; any divergence fails before they desync. Completes the cross-backend item the plan deferred at the crate boundary (harness in the engine, binary in the CLI). White-box battery stays embedded-only by construction; the CLI arm gets the black-box oracles. The walk surfaced one genuine cross-backend behavioral difference, allow-listed as a known contract (not a transport bug): embedded repair(force=false) returns Ok and leaves suspicious/unverifiable drift (e.g. RC-1's) in place, while the CLI `repair --confirm` (no --force) exits non-zero refusing to publish it — both no-ops on logical data, so the state oracles still agree. Every other op's success/failure matches exactly, including through RC-1 errors and optimize. cargo test -p omnigraph-cli --test cli_cross_backend_walk -> 1 passed.
MATRIX.md: D5 CLI backend ❌ deferred → ✅ (PR-E); split out the still-deferred long-lived server walk; update the roadmap summary. testing.md: list cli_cross_backend_walk in the CLI suites, add its run command, and document the omnigraph-dst crate (the Backend trait + Embedded/Cli, the dev-dep cycle, and why the white-box battery stays embedded-only).
There was a problem hiding this comment.
💡 Codex Review
omnigraph/crates/omnigraph-dst/src/model.rs
Lines 84 to 85 in 99abd2a
When cli_cross_backend_walk drives two independent Model::new() instances with the same RNG seed, edge ops sample from these vectors by index. Because HashSet/HashMap iteration is randomly seeded per map, the embedded and CLI models can produce different persons_vec/free_froms orders even with identical logical contents; with the current seeds the first InsertKnows can pick different endpoints, so the supposedly identical op stream diverges and the per-step edge oracle fails/flakes. Sort these vectors or store the model sets in an ordered collection before sampling.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Finding::Logical(s) => { | ||
| if s.contains("dup-@key") { | ||
| return Some("dup-@key MR-714"); | ||
| Some("dup-@key MR-714") |
There was a problem hiding this comment.
Lost Writes Become Known Duplicates
classify now allow-lists any logical finding containing dup-@key, but the count oracle emits that marker in the ambiguous lost-write or dup-@key message. When a real lost write produces count < model, the embedded walk can classify it as the known duplicate-key issue and continue instead of failing.
Context Used: AGENTS.md (source)
…rrors/panics Addresses review findings on the DST oracle soundness (Codex/Greptile): #1 (correctness): the Person count==model message carried the `dup-@key` marker, and classify() allow-lists ANY logical finding containing it as the known MR-714 bug — so a genuine LOST WRITE (count < model), the exact case this oracle guards in the faulted walk, was silently recorded as a known bug instead of failing. Drop the marker from the count message; real duplicate-@key detection stays in check_content (value-level) + invariants::no_duplicate_keys, which emit the marker only when a duplicate is actually found. #4 (oracle soundness): two sites discarded failures and could false-pass. - concurrent_walk_structural_invariants ignored JoinError, hiding any actor panic that left no durable corruption; now classify it (known crash recorded, novel actor panic fails) like the sequential walk. - s3_battery_holds discarded every op::step error; since step only advances the model on success, a swallowed S3 fault let the battery pass over a real backend failure. Now classify op errors (known allow-listed, novel fails). cargo test -p omnigraph-engine --test dst -> 16 passed.
…tAdapter scope Addresses the remaining valid review findings (doc/CI accuracy, no behavior change): #2: s3_battery_holds is bucket-gated and lives in the `dst` target, which no CI job ran — so the DST S3 battery never executed despite MATRIX/testing.md marking S3 ✅. Add a 'RustFS DST white-box battery' step to the rustfs_integration job (same grep-guard against a vacuous pass) and add the dst test paths + omnigraph-dst crate to the RustFS change filter so a DST change triggers the job. #6: the MATRIX 'open / recovery sweep' row said unsampled while the same harness ships the Reopen walk op + the dst_recovery failpoint cells (roll-forward + the #296 concurrent-opens cell). Mark it ✅ with the actual coverage. #7: tighten the FaultAdapter wording (module doc + the seeded_op_loop_with_cas_faults comment) — the seam is the StorageAdapter conditional TEXT-OBJECT write (sidecars/schema-staging/cluster-state), NOT the Lance manifest-publish CAS (MergeInsertBuilder), which dst_recovery covers via failpoints. No code change.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 58a7a3c. Configure here.
| } | ||
| Err(e) => Err(e), | ||
| }; | ||
| (OpKind::DeletePerson, res) |
There was a problem hiding this comment.
Delete updates model on no-op
Low Severity
DeletePerson updates the reference Model on every successful mutate, while UpdateDoc only updates the model when the target doc exists in the model. A successful delete that matches zero rows still calls model.del_person, which can desync the model from the engine when the slug was never live in the model but the op returns Ok.
Reviewed by Cursor Bugbot for commit 58a7a3c. Configure here.
…r JoinError Follow-up review findings on the consolidated PR (Greptile/Cursor): - Cli backend dropped the `branch` arg (Embedded honored it) — the trait contract was main-only by accident. Pass `--branch <branch>` on mutate/query. - cli_cross_backend_walk skipped success parity for EVERY Repair; now allow-list only the known divergence (embedded Ok + CLI 'refused suspicious' drift), so a broken repair invocation still fails. - classify_backend's Cli arm ran the RC-1 allow-list via classify_engine_signal with is_manifest=true, so generic 'expected'+'current' stderr could be masked as RC-1; gate the out-of-process path on the distinctive strings only (stale view / ahead of manifest / from_sorted_iter / dup-@key). - concurrent walk ignored non-panic actor JoinErrors (cancel/abort); surface them instead of falling through silently. Dismissed (not a bug): 'delete updates model on no-op' — del_person on an absent id is a harmless no-op, so count==model still holds. Verified: --test dst -> 16 passed; --test cli_cross_backend_walk -> 1 passed.


Consolidated, TESTS-ONLY DST harness. Formerly the stacked PRs #303 → #305 → #309; retargeted to
mainas a single reviewable unit since all three carry zero production-code changes. The one engine-srcchange in the effort — the deterministic-replay seam — is intentionally kept separate in #304 so the only production change gets isolated review.Supersedes (closed): #303 (comprehensive harness), #305 (fuzz + S3 context). Stays separate: #304 (determinism seam, engine
src).What's in here (all under
crates/omnigraph/tests/dst*+ the newcrates/omnigraph-dst)Dataset::validate, unique live_rowid[RC-X class], scalar-index probe, count/content/edges==model,@keyuniqueness) + a structured known/novel classifier. Pins the known bugs as characterization regressions (RC-1, RC-X, dup-@key/MR-714, self-loop-not-traversable).FaultAdapter), proptest-state-machine shrinking, a D3×D2 read-shape battery, a concurrent multi-actor walk, failpoint recovery cells (dst_recovery, incl. the fix(recovery): converge roll-forward on concurrent manifest advance #296 concurrent-opens-converge cell), parser/loader fuzz, and an env-gated S3-context battery.crates/omnigraph-dst(dev-only, not a default-member) — the reusable harness behind a black-boxBackendtrait (Embedded+Cli), enabling…cli_cross_backend_walk— the same seeded walk driven against the embedded SDK AND theomnigraphCLI subprocess, asserting per-step black-box agreement.The dev-dependency cycle (engine dev-dep → omnigraph-dst → engine) is supported by cargo (lib builds without dev-deps; validated). White-box battery is embedded-only by construction; the CLI arm runs the black-box oracles.
Review findings addressed
dup-@keymarker thatclassify()allow-lists, so a genuine lost write was recorded as known MR-714 instead of failing. Marker dropped; real dup detection stays incheck_content+no_duplicate_keys.JoinErrorpanics, ands3_battery_holdsclassifies op errors, instead of discarding (both could false-pass).rustfs_integrationjob + the dst paths to the RustFS change filter (the env-gated skip itself is by-design).dst_recovery's failpoints).Verification
cargo test -p omnigraph-engine --test dst→ 16 passed.cargo test -p omnigraph-engine --features failpoints --test dst_recovery→ 2 passed.cargo test -p omnigraph-cli --test cli_cross_backend_walk→ 1 passed.Note
Low Risk
Changes are confined to test crates, dev-dependencies, CI, and docs; production binaries are unaffected unless a separate determinism PR lands.
Overview
Adds a tests-only deterministic-simulation (DST) stack: a shared dev crate
omnigraph-dst(op alphabet, reference model, white-box invariant battery,FaultAdapter, and aBackendtrait withEmbedded+Clibackends) plus engine integration suites undertests/dst*and CLI parity tests.The engine
dsttarget runs seeded generative walks with known-vs-novel classification (allow-listing RC-1, RC-X, dup-@key, etc.), characterization regressions, concurrent structural walks, read-shape × morphology battery, parser/loader fuzz, optional CAS-fault walks, env-gateds3_battery_holds, and a separatedst_recoverybinary (failpoints) for manifest sidecar / concurrent-open convergence cells.cli_dst_parityandcli_cross_backend_walklockstep the same op stream against embedded SDK vs CLI subprocess (black-box oracles only on the CLI arm).CI extends RustFS change detection to DST paths and runs
s3_battery_holdswith a grep guard against vacuous passes.docs/dev/testing.mddocuments run modes and the coverage ledger inMATRIX.md.Reviewed by Cursor Bugbot for commit 9fd1ac7. Bugbot is set up for automated code reviews on this repo. Configure here.
Greptile Summary
This PR adds a shared DST test harness and wires it into engine, CLI, CI, and docs. The main changes are:
omnigraph-dstcrate with shared backend, model, op, fault, and invariant helpers.Confidence Score: 5/5
This looks safe to merge.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[omnigraph-dst crate] --> B[Embedded backend] A --> C[CLI backend] A --> D[Reference model] A --> E[Invariant battery] B --> F[Engine dst.rs tests] C --> G[CLI cross-backend tests] E --> F D --> F D --> G F --> H[RustFS S3 CI battery]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[omnigraph-dst crate] --> B[Embedded backend] A --> C[CLI backend] A --> D[Reference model] A --> E[Invariant battery] B --> F[Engine dst.rs tests] C --> G[CLI cross-backend tests] E --> F D --> F D --> G F --> H[RustFS S3 CI battery]Reviews (5): Last reviewed commit: "fix(dst): honor Cli branch; tighten repa..." | Re-trigger Greptile
Context used: