Skip to content

DST harness — generative simulation suite + shared omnigraph-dst crate + cross-backend walk (tests-only; consolidates #303 + #305)#309

Open
ragnorc wants to merge 17 commits into
mainfrom
dst-extract-crate
Open

DST harness — generative simulation suite + shared omnigraph-dst crate + cross-backend walk (tests-only; consolidates #303 + #305)#309
ragnorc wants to merge 17 commits into
mainfrom
dst-extract-crate

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Consolidated, TESTS-ONLY DST harness. Formerly the stacked PRs #303#305#309; retargeted to main as a single reviewable unit since all three carry zero production-code changes. The one engine-src change in the effort — the deterministic-replay seam — is intentionally kept separate in #304 so the only production change gets isolated review.

Supersedes (closed): #303 (comprehensive harness), #305 (fuzz + S3 context). Stays separate: #304 (determinism seam, engine src).

What's in here (all under crates/omnigraph/tests/dst* + the new crates/omnigraph-dst)

  • Generative DST harness — seeded op-alphabet walk + white-box invariant battery after every op (HEAD==manifest, Dataset::validate, unique live _rowid [RC-X class], scalar-index probe, count/content/edges==model, @key uniqueness) + a structured known/novel classifier. Pins the known bugs as characterization regressions (RC-1, RC-X, dup-@key/MR-714, self-loop-not-traversable).
  • Fault injection (FaultAdapter), proptest-state-machine shrinking, a D3×D2 read-shape battery, a concurrent multi-actor walk, failpoint recovery cells (dst_recovery, incl. the fix(recovery): converge roll-forward on concurrent manifest advance #296 concurrent-opens-converge cell), parser/loader fuzz, and an env-gated S3-context battery.
  • crates/omnigraph-dst (dev-only, not a default-member) — the reusable harness behind a black-box Backend trait (Embedded + Cli), enabling…
  • cli_cross_backend_walk — the same seeded walk driven against the embedded SDK AND the omnigraph CLI subprocess, asserting per-step black-box agreement.

The dev-dependency cycle (engine dev-dep → omnigraph-dst → engine) is supported by cargo (lib builds without dev-deps; validated). White-box battery is embedded-only by construction; the CLI arm runs the black-box oracles.

Review findings addressed

  • count==model masking lost-writes — the Person message carried the dup-@key marker that classify() allow-lists, so a genuine lost write was recorded as known MR-714 instead of failing. Marker dropped; real dup detection stays in check_content + no_duplicate_keys.
  • swallowed errors/panics — the concurrent walk now classifies actor JoinError panics, and s3_battery_holds classifies op errors, instead of discarding (both could false-pass).
  • S3 battery never ran in CI — added a step to the rustfs_integration job + the dst paths to the RustFS change filter (the env-gated skip itself is by-design).
  • stale MATRIX recovery row + FaultAdapter scope wording tightened (the seam is the StorageAdapter conditional text-object write, not the Lance manifest-publish CAS — that's dst_recovery's failpoints).

Verification

  • cargo test -p omnigraph-engine --test dst16 passed.
  • cargo test -p omnigraph-engine --features failpoints --test dst_recovery2 passed.
  • cargo test -p omnigraph-cli --test cli_cross_backend_walk1 passed.
  • Crate builds clean; no cargo cycle; CI YAML validated.

Note for merge: #304 (determinism) is based on #303's commits; once this lands on main, #304 needs a rebase onto main (its #303-based commits become redundant).


Note

Low Risk
Changes are confined to test crates, dev-dependencies, CI, and docs; production binaries are unaffected unless a separate determinism PR lands.

Overview
Adds a tests-only deterministic-simulation (DST) stack: a shared dev crate omnigraph-dst (op alphabet, reference model, white-box invariant battery, FaultAdapter, and a Backend trait with Embedded + Cli backends) plus engine integration suites under tests/dst* and CLI parity tests.

The engine dst target runs seeded generative walks with known-vs-novel classification (allow-listing RC-1, RC-X, dup-@key, etc.), characterization regressions, concurrent structural walks, read-shape × morphology battery, parser/loader fuzz, optional CAS-fault walks, env-gated s3_battery_holds, and a separate dst_recovery binary (failpoints) for manifest sidecar / concurrent-open convergence cells. cli_dst_parity and cli_cross_backend_walk lockstep the same op stream against embedded SDK vs CLI subprocess (black-box oracles only on the CLI arm).

CI extends RustFS change detection to DST paths and runs s3_battery_holds with a grep guard against vacuous passes. docs/dev/testing.md documents run modes and the coverage ledger in MATRIX.md.

Reviewed by Cursor Bugbot for commit 9fd1ac7. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR adds a shared DST test harness and wires it into engine, CLI, CI, and docs. The main changes are:

  • New dev-only omnigraph-dst crate with shared backend, model, op, fault, and invariant helpers.
  • New engine DST suites for generative walks, recovery cells, fuzzing, read-shape coverage, and S3 coverage.
  • New CLI parity tests that compare embedded and subprocess-backed execution.
  • RustFS CI wiring for the S3 DST battery.
  • Updated testing docs for the new harness.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
crates/omnigraph-dst/src/backend.rs Adds the shared backend trait and embedded/CLI implementations used by the DST walks.
crates/omnigraph-dst/src/invariants.rs Adds the known/novel classifier and the embedded white-box invariant battery.
crates/omnigraph-dst/src/model.rs Adds the reference model and black-box count, content, and edge oracles.
crates/omnigraph-dst/src/op.rs Defines the seeded operation alphabet shared by embedded and CLI test paths.
crates/omnigraph/tests/dst.rs Adds the main engine DST suite, including regressions, walks, concurrent coverage, and the S3 battery.
crates/omnigraph-cli/tests/cli_cross_backend_walk.rs Adds the generative embedded-vs-CLI parity walk.
.github/workflows/ci.yml Adds DST paths to RustFS change detection and runs the DST S3 battery in that job.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[omnigraph-dst crate] --> B[Embedded backend]
    A --> C[CLI backend]
    A --> D[Reference model]
    A --> E[Invariant battery]
    B --> F[Engine dst.rs tests]
    C --> G[CLI cross-backend tests]
    E --> F
    D --> F
    D --> G
    F --> H[RustFS S3 CI battery]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[omnigraph-dst crate] --> B[Embedded backend]
    A --> C[CLI backend]
    A --> D[Reference model]
    A --> E[Invariant battery]
    B --> F[Engine dst.rs tests]
    C --> G[CLI cross-backend tests]
    E --> F
    D --> F
    D --> G
    F --> H[RustFS S3 CI battery]
Loading

Reviews (5): Last reviewed commit: "fix(dst): honor Cli branch; tighten repa..." | Re-trigger Greptile

Context used:

  • Context used - AGENTS.md (source)
  • Context used - CLAUDE.md (source)

ragnorc added 13 commits June 25, 2026 12:21
…, critic findings)

Port: brings the full harness (B1-B6) onto current main (0.7.2). Verified
all 5 bug guards still reproduce on 0.7.2 — its only fix (#296 recovery
roll-forward convergence) addresses a DIFFERENT bug than any the harness
catches. Adds test-only dev-deps (arrow-array, futures, serde_json,
async-trait, proptest-state-machine).

Harden (completeness-critic pass):
- MATRIX.md — the coverage ledger: enumerates D1-D5 sampled/unsampled
  cells, names the HIDDEN dimensions found only after a miss, and explains
  why exhaustive sampling is impossible (path-spaces / non-enumerable
  schedules / open-world model). Prioritized gap backlog.
- Reopen op — open/recovery sweep is now a first-class walk op (drop +
  reopen mid-walk), sampled across varied table states, not only at the
  end. ops 10/10, 0 novel violations.
- Critic finding #1 (#296 cell): concurrent open / >=2 recovery sweeps on
  one sidecar is unsampled — the named blind spot that let #296 through.
- Critic finding #2 (fault seam): the FaultAdapter wraps
  StorageAdapter::write_text_if_match, but the __manifest publish is a
  Lance MergeInsertBuilder CAS (publisher.rs:377) that never flows through
  it — verified empirically (cas_pct=100 write succeeds, no sidecar). So
  Phase-2 'manifest CAS fault injection' is off the hot path; real
  injection needs failpoints or a Lance object_store wrapper. This is the
  prerequisite for closing the #296 cell.
The StorageAdapter FaultAdapter is off the manifest-publish path, so it
can't induce a recovery sidecar. Failpoints can. New tests/dst_recovery.rs
(own binary — the process-global fail registry must not leak into the
main, non-serial walks; verified the in-binary version broke 3 parallel
tests):

- recovery_rolls_forward_under_finalize_failure: mutation.post_finalize_pre_publisher
  leaves a real RolledPastExpected sidecar; reopen rolls it forward; the
  harness WHITE-BOX structural battery holds (additive vs the engine's
  count-only test).
- concurrent_opens_converge_on_pending_sidecar (the #296 cell): an inline
  park-first rendezvous at recovery.before_roll_forward_publish forces two
  open sweeps to race one sidecar; the CAS-loser must CONVERGE, not fail the
  open. Non-vacuous (rendezvous panics if the race never fires); guards
  0.7.2 (the failpoint was added with the #296 fix, so it can't run on true
  0.7.1).

MATRIX.md backlog: fault-seam (#1), #296 cell (#2), Reopen op (#3) now
DONE; cross-process + generative-via-object_store remain.
The #7230 lesson: a parse/merge path only ever fed clean input shipped a
latent crash. Fuzz the GQ query parser (find_named_query), the schema
parser (parse_schema), and the JSONL loader with arbitrary + structured
+ adversarial (dup-@key, reordered/extra/missing fields, wrong types,
malformed JSON, orphan edges, enum-out-of-range, huge strings) input and
assert they return a Result, never panic — and the graph stays usable
after the abuse. proptest shrinks+persists any panic. cargo-fuzz/libFuzzer
targets deferred (need nightly); this gets the same coverage on stable.
All green — no parser panics found.
Runs a short seeded op sequence + the full white-box battery against a
real object-store backend (s3://), not just local FS — the D5 S3 context
dimension. Gated on OMNIGRAPH_S3_TEST_BUCKET (skips locally; runs in CI's
rustfs_integration job), mirroring tests/s3_storage.rs. Known bugs
allow-listed, novel -> fail.
…arity

A DST-flavored op sequence (init -> 2x load --merge multi-fragment ->
insert -> edge-free delete -> query) run against the embedded Omnigraph
SDK and the omnigraph CLI subprocess on twin local graphs must agree on
the final person-slug set. Fills the gap parity_matrix.rs leaves (it does
single-verb CLI-local-vs-server; this adds the embedded-SDK arm + a
multi-op sequence). Format-tolerant oracle (embedded to_rust_json vs CLI
--json differ in wrapping): compare the set of smoke-* slug strings. The
CLI wraps the same engine, so this guards the CLI transport layer (arg
parse, --json serialization, --store addressing, --as actor) against
divergence from embedded. Reuses support::cli(). Green.
…_dst_parity, the dst feature)

Per the maintenance contract: add the DST harness to the test-coverage
map — engine table rows for dst.rs (generative walks + white-box battery
+ regressions + statemachine shrinking + readshape + fuzz + S3 cell +
replay-equality) and dst_recovery.rs (failpoint recovery / #296 cells);
the cli_dst_parity cross-backend smoke in the omnigraph-cli row; a
'Running the DST harness' section (the feature flags + run commands + the
test-only zero-cost dst determinism seam); and the S3 cell in the rustfs
CI list. Points at dst/MATRIX.md for the coverage ledger.
…(PR-E foundation)

New dev-only crate crates/omnigraph-dst (workspace member, NOT a
default-member) depending on omnigraph-engine + omnigraph-compiler, wired
as a [dev-dependencies] of omnigraph-engine. This forms a dev-dependency
cycle (engine dev-dep -> omnigraph-dst -> engine); VERIFIED cargo accepts
it (cargo metadata resolves; cargo build -p omnigraph-dst finishes) — the
make-or-break gate for extracting op/model/invariants behind a Backend
trait so the generative walk can run cross-backend (embedded + CLI).
Modules land in the following commits.
…nto omnigraph-dst

The generative walk was hard-coupled to `&Omnigraph` (step called
db.mutate/load_jsonl/db.query directly), so it could only run embedded. Move
the reusable harness primitives into the omnigraph-dst lib behind a black-box
`Backend` trait so the SAME seeded walk can drive any context:

- backend.rs: `Backend` (load/mutate/query/optimize/repair, normalized rows)
  + `BackendError`{Engine(OmniError)|Cli{code,stderr}} + `Embedded` (wraps
  Omnigraph, optional FaultAdapter, exposes db() for the white-box battery)
  + `Cli` (subprocess; binary path injected by the consumer).
- op.rs: `step<B: Backend>` (was step(&Omnigraph)); all gq standardized to a
  single query named `q` so embedded names it and the CLI infers it from -e.
- model.rs: count/content/edges-traversable==model are now black-box (via
  Backend::query, key parsed by suffix to survive embedded-vs-CLI key naming);
  check_edges keeps its raw white-box snapshot half (embedded-only).
- invariants.rs: white-box battery unchanged (takes &Omnigraph via db());
  classifier split into classify(&Finding) + classify_backend(&BackendError)
  over a shared engine-signal core; run_battery takes &Embedded.
- fault.rs: FaultAdapter moved verbatim.

White-box stays embedded-only by construction; a cross-backend walk runs the
black-box oracles. Crate builds clean (zero warnings). Consumers rewired next.
Delete the engine-local copies of op/model/invariants/fault/backend (now in
the shared crate) and rewire the consumers to use omnigraph_dst::…:

- dst.rs: the generative walk + s3 battery drive Embedded (the Backend impl)
  with step + run_battery; op errors classify via classify_backend. The named
  regressions/concurrency/branch scenarios are white-box, so they keep a raw
  Omnigraph handle via a local open_clean helper.
- dst_recovery.rs: drops the 5 #[path] modules (failpoint cells only need the
  raw handle + omnigraph_dst::invariants); recovery_walk rewired likewise.
- coverage/statemachine/fuzz: stay engine-local (drive the handle directly),
  now import OpKind/op/invariants from omnigraph_dst.

Eliminates the #[path]-duplication that compiled those 5 modules twice (once
per test binary). Behavior unchanged: --test dst -> 16 passed, all 4
regressions trip, both walks find only known bugs (0 novel), coverage
ops 10/10 invariants 7/7; --test dst_recovery (failpoints) compiles clean.
…mnigraph-dst

The same seeded DST op stream now drives BOTH the embedded Omnigraph SDK and
the omnigraph CLI subprocess (Embedded + Cli Backend impls from omnigraph-dst,
binary via CARGO_BIN_EXE_omnigraph), asserting per-step black-box agreement on
the Person/Doc slug sets + traversable edge count, plus op-kind and op-success
parity. Two lockstep walks (same seed + identical reference Model) stay in sync
because both wrap the same engine; any divergence fails before they desync.

Completes the cross-backend item the plan deferred at the crate boundary
(harness in the engine, binary in the CLI). White-box battery stays
embedded-only by construction; the CLI arm gets the black-box oracles.

The walk surfaced one genuine cross-backend behavioral difference, allow-listed
as a known contract (not a transport bug): embedded repair(force=false) returns
Ok and leaves suspicious/unverifiable drift (e.g. RC-1's) in place, while the
CLI `repair --confirm` (no --force) exits non-zero refusing to publish it —
both no-ops on logical data, so the state oracles still agree. Every other op's
success/failure matches exactly, including through RC-1 errors and optimize.

cargo test -p omnigraph-cli --test cli_cross_backend_walk -> 1 passed.
MATRIX.md: D5 CLI backend ❌ deferred → ✅ (PR-E); split out the still-deferred
long-lived server walk; update the roadmap summary. testing.md: list
cli_cross_backend_walk in the CLI suites, add its run command, and document the
omnigraph-dst crate (the Backend trait + Embedded/Cli, the dev-dep cycle, and
why the white-box battery stays embedded-only).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

pub fn persons_vec(&self) -> Vec<usize> {
self.persons.iter().copied().collect()

P1 Badge Make model iteration deterministic for lockstep walks

When cli_cross_backend_walk drives two independent Model::new() instances with the same RNG seed, edge ops sample from these vectors by index. Because HashSet/HashMap iteration is randomly seeded per map, the embedded and CLI models can produce different persons_vec/free_froms orders even with identical logical contents; with the current seeds the first InsertKnows can pick different endpoints, so the supposedly identical op stream diverges and the per-step edge oracle fails/flakes. Sort these vectors or store the model sets in an ordered collection before sampling.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 74 to +76
Finding::Logical(s) => {
if s.contains("dup-@key") {
return Some("dup-@key MR-714");
Some("dup-@key MR-714")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Lost Writes Become Known Duplicates

classify now allow-lists any logical finding containing dup-@key, but the count oracle emits that marker in the ambiguous lost-write or dup-@key message. When a real lost write produces count < model, the embedded walk can classify it as the known duplicate-key issue and continue instead of failing.

Context Used: AGENTS.md (source)

Fix in Claude Code

Comment thread crates/omnigraph-dst/src/backend.rs Outdated
Comment thread crates/omnigraph-cli/tests/cli_cross_backend_walk.rs Outdated
…rrors/panics

Addresses review findings on the DST oracle soundness (Codex/Greptile):

#1 (correctness): the Person count==model message carried the `dup-@key`
marker, and classify() allow-lists ANY logical finding containing it as the
known MR-714 bug — so a genuine LOST WRITE (count < model), the exact case this
oracle guards in the faulted walk, was silently recorded as a known bug instead
of failing. Drop the marker from the count message; real duplicate-@key
detection stays in check_content (value-level) + invariants::no_duplicate_keys,
which emit the marker only when a duplicate is actually found.

#4 (oracle soundness): two sites discarded failures and could false-pass.
- concurrent_walk_structural_invariants ignored JoinError, hiding any actor
  panic that left no durable corruption; now classify it (known crash recorded,
  novel actor panic fails) like the sequential walk.
- s3_battery_holds discarded every op::step error; since step only advances the
  model on success, a swallowed S3 fault let the battery pass over a real
  backend failure. Now classify op errors (known allow-listed, novel fails).

cargo test -p omnigraph-engine --test dst -> 16 passed.
Comment thread crates/omnigraph-dst/src/backend.rs
Comment thread crates/omnigraph-cli/tests/cli_cross_backend_walk.rs
…tAdapter scope

Addresses the remaining valid review findings (doc/CI accuracy, no behavior change):

#2: s3_battery_holds is bucket-gated and lives in the `dst` target, which no CI
job ran — so the DST S3 battery never executed despite MATRIX/testing.md marking
S3 ✅. Add a 'RustFS DST white-box battery' step to the rustfs_integration job
(same grep-guard against a vacuous pass) and add the dst test paths +
omnigraph-dst crate to the RustFS change filter so a DST change triggers the job.

#6: the MATRIX 'open / recovery sweep' row said unsampled while the same harness
ships the Reopen walk op + the dst_recovery failpoint cells (roll-forward + the
#296 concurrent-opens cell). Mark it ✅ with the actual coverage.

#7: tighten the FaultAdapter wording (module doc + the seeded_op_loop_with_cas_faults
comment) — the seam is the StorageAdapter conditional TEXT-OBJECT write
(sidecars/schema-staging/cluster-state), NOT the Lance manifest-publish CAS
(MergeInsertBuilder), which dst_recovery covers via failpoints. No code change.
@ragnorc ragnorc changed the title PR-E: extract omnigraph-dst crate + full generative cross-backend (embedded vs CLI) walk DST harness — generative simulation suite + shared omnigraph-dst crate + cross-backend walk (tests-only; consolidates #303 + #305) Jun 26, 2026
@ragnorc ragnorc changed the base branch from dst-d5-fanout to main June 26, 2026 17:06

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 58a7a3c. Configure here.

Comment thread crates/omnigraph/tests/dst.rs
Comment thread crates/omnigraph-dst/src/invariants.rs
}
Err(e) => Err(e),
};
(OpKind::DeletePerson, res)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete updates model on no-op

Low Severity

DeletePerson updates the reference Model on every successful mutate, while UpdateDoc only updates the model when the target doc exists in the model. A successful delete that matches zero rows still calls model.del_person, which can desync the model from the engine when the slug was never live in the model but the op returns Ok.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 58a7a3c. Configure here.

…r JoinError

Follow-up review findings on the consolidated PR (Greptile/Cursor):

- Cli backend dropped the `branch` arg (Embedded honored it) — the trait
  contract was main-only by accident. Pass `--branch <branch>` on mutate/query.
- cli_cross_backend_walk skipped success parity for EVERY Repair; now allow-list
  only the known divergence (embedded Ok + CLI 'refused suspicious' drift), so a
  broken repair invocation still fails.
- classify_backend's Cli arm ran the RC-1 allow-list via classify_engine_signal
  with is_manifest=true, so generic 'expected'+'current' stderr could be masked
  as RC-1; gate the out-of-process path on the distinctive strings only
  (stale view / ahead of manifest / from_sorted_iter / dup-@key).
- concurrent walk ignored non-panic actor JoinErrors (cancel/abort); surface
  them instead of falling through silently.

Dismissed (not a bug): 'delete updates model on no-op' — del_person on an
absent id is a harmless no-op, so count==model still holds.

Verified: --test dst -> 16 passed; --test cli_cross_backend_walk -> 1 passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant