test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302) by ragnorc · Pull Request #303 · ModernRelay/omnigraph

ragnorc · 2026-06-25T10:44:35Z

Supersedes #300 and #302 — consolidates the full DST harness onto current main (0.7.2) in one PR, plus the completeness-critic hardening pass. Tests-only (no engine src changes).

The harness (B1–B6, was #300 + #302)

A seeded, model-based DST harness over the morphological matrix (D1 ops · D2 fragment morphology · D3 read shapes · D4 oracles · D5 context):

White-box invariant battery after every op: HEAD==manifest, Dataset::validate, unique live _rowid (RC-X corruption class, deletion-vector-correct), index-probe, count==model, content==model, edges==model (RI), @key-unique.
Generative walk (panic-robust via catch_unwind + classify_panic), branch isolation/merge scenario, read-shape battery × 4 morphologies, concurrent multi-actor walk, and a proptest-state-machine shrinking campaign over the clean op subset.

Bugs guarded (5)

Reproduced/pinned as characterization tests, all still present on 0.7.2 (verified):

	layer	guard
RC-X / Lance #7230 (scalar-BTREE dup row addr)	Lance (fixed v8.0.0)	`regression_rc_x_…` + index-probe
RC-1 stale-view manifest CAS on delete combos	Omnigraph	`regression_rc1_…` + HEAD==manifest
dup-`@key` (MR-714)	Omnigraph	`regression_dup_key_…` + concurrent walk
self-loop `Knows` not traversable (NEW)	Omnigraph (Expand)	`regression_self_loop_…` + edges==model
Lance FTS inverted-builder OOB panic (NEW)	Lance	caught + classified by the walk

Hardening (completeness-critic pass)

dst/MATRIX.md — a coverage ledger: D1–D5 sampled/unsampled cells, the hidden dimensions found only after a miss, and why exhaustive sampling is impossible. A living artifact, re-run when an external bug slips through.
Reopen op — the recovery sweep is now a first-class walk op.
Critic finding: the Phase-2 FaultAdapter wraps StorageAdapter::write_text_if_match, which is off the manifest-publish path (a Lance MergeInsertBuilder CAS) — so it couldn't induce a recovery sidecar.
Fault-seam closed via a --features failpoints variant in its own binary (tests/dst_recovery.rs): induces a real RolledPastExpected sidecar, and a rendezvous-forced concurrent-open race (the #296 blind spot) asserting the CAS-loser converges. Run: cargo test -p omnigraph-engine --features failpoints --test dst_recovery.

Why 0.7.2

The only fix between 0.7.1 and 0.7.2 (#296, recovery roll-forward convergence) fixes a different recovery bug than any the harness catches — confirmed by running the harness against 0.7.2 (all 5 guards stay green). The #296 cell is now sampled so its class can't slip again.

Run

cargo test -p omnigraph-engine --test dst (10 tests)
cargo test -p omnigraph-engine --features failpoints --test dst_recovery (2 cells)

New test-only dev-deps: arrow-array, futures, serde_json, async-trait, proptest-state-machine.

Note

Low Risk
Test-only additions and dev-dependencies; no production engine code paths change.

Overview
Tests-only — adds a deterministic-simulation / morphological-matrix (DST) harness under crates/omnigraph/tests/dst/ with entrypoints tests/dst.rs and tests/dst_recovery.rs (no engine src changes).

The harness drives the real omnigraph-engine through a seeded op alphabet, a reference model, and a white-box invariant battery after each step (HEAD==manifest, Dataset::validate, unique live _rowid, index probe, count/content/edges vs model, @key checks). Findings are classified as known open bugs (allow-listed) vs novel (fail). Coverage is tracked; dst/MATRIX.md documents sampled vs gap matrix dimensions and critic findings (e.g. FaultAdapter CAS seam vs manifest publish path).

New scenarios: named characterization regressions (RC-1, RC-X, dup-@key, self-loop traversal, etc.); generative walks with optional FaultAdapter CAS faults, mid-walk Reopen, and panic containment; concurrent multi-actor structural walk; branch isolation/merge; read-shape battery × table morphologies; proptest-state-machine shrinking on clean ops.

Failpoint binary (--features failpoints --test dst_recovery): real pending recovery sidecars and deterministic concurrent-open race (#296), isolated from parallel main DST runs.

Dev-deps: proptest-state-machine, plus arrow-array, futures, serde_json, async-trait for the harness.

^{Reviewed by Cursor Bugbot for commit 76fa48f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR adds a test-only DST harness for Omnigraph. The main changes are:

Seeded operation walks with model and invariant checks.
Regression guards for known Lance and Omnigraph bugs.
Read-shape and morphology coverage tests.
Failpoint-gated recovery race tests.
A coverage ledger for sampled and deferred cells.

Confidence Score: 4/5

The DST harness needs a small correctness fix before merging.

A generic Person count mismatch can be classified as the known duplicate-key bug.
That can hide lost-write failures in the new walk.
The new build and failpoint test surfaces otherwise look consistent.

crates/omnigraph/tests/dst/model.rs

Important Files Changed

Filename	Overview
crates/omnigraph/tests/dst.rs	Adds the main DST test binary with seeded walks, regression guards, read-shape coverage, and durability checks.
crates/omnigraph/tests/dst/model.rs	Adds the reference model and invariant checks, with one count-mismatch message that can be over-allow-listed.
crates/omnigraph/tests/dst/invariants.rs	Adds known-bug classification and the invariant battery used by the walk.
crates/omnigraph/tests/dst/op.rs	Defines the generated operation alphabet and model update rules.
crates/omnigraph/tests/dst/recovery_walk.rs	Adds failpoint-driven recovery and concurrent-open race coverage.
crates/omnigraph/tests/dst/statemachine.rs	Adds a proptest-state-machine campaign over the clean operation subset.
crates/omnigraph/tests/dst/MATRIX.md	Documents DST coverage and remaining gaps, with one stale recovery row.
crates/omnigraph/Cargo.toml	Adds the test-support dependencies used by the new harness.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
  A[Seeded DST walk] --> B[Run engine op]
  B --> C{Op succeeds?}
  C -->|yes| D[Update model]
  C -->|no| E[Classify error]
  D --> F[Run invariant battery]
  E --> F
  F --> G{Finding?}
  G -->|known| H[Record known bug]
  G -->|novel| I[Fail test]
  G -->|none| A
  A --> J[Reopen durability check]
  J --> F

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
  A[Seeded DST walk] --> B[Run engine op]
  B --> C{Op succeeds?}
  C -->|yes| D[Update model]
  C -->|no| E[Classify error]
  D --> F[Run invariant battery]
  E --> F
  F --> G{Finding?}
  G -->|known| H[Record known bug]
  G -->|novel| I[Fail test]
  G -->|none| A
  A --> J[Reopen durability check]
  J --> F

_{Reviews (1): Last reviewed commit: "chore: lock new test-only dev-deps (arro..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

Context used:

Context used - AGENTS.md (source)
Context used - CLAUDE.md (source)

…, critic findings) Port: brings the full harness (B1-B6) onto current main (0.7.2). Verified all 5 bug guards still reproduce on 0.7.2 — its only fix (#296 recovery roll-forward convergence) addresses a DIFFERENT bug than any the harness catches. Adds test-only dev-deps (arrow-array, futures, serde_json, async-trait, proptest-state-machine). Harden (completeness-critic pass): - MATRIX.md — the coverage ledger: enumerates D1-D5 sampled/unsampled cells, names the HIDDEN dimensions found only after a miss, and explains why exhaustive sampling is impossible (path-spaces / non-enumerable schedules / open-world model). Prioritized gap backlog. - Reopen op — open/recovery sweep is now a first-class walk op (drop + reopen mid-walk), sampled across varied table states, not only at the end. ops 10/10, 0 novel violations. - Critic finding #1 (#296 cell): concurrent open / >=2 recovery sweeps on one sidecar is unsampled — the named blind spot that let #296 through. - Critic finding #2 (fault seam): the FaultAdapter wraps StorageAdapter::write_text_if_match, but the __manifest publish is a Lance MergeInsertBuilder CAS (publisher.rs:377) that never flows through it — verified empirically (cas_pct=100 write succeeds, no sidecar). So Phase-2 'manifest CAS fault injection' is off the hot path; real injection needs failpoints or a Lance object_store wrapper. This is the prerequisite for closing the #296 cell.

The StorageAdapter FaultAdapter is off the manifest-publish path, so it can't induce a recovery sidecar. Failpoints can. New tests/dst_recovery.rs (own binary — the process-global fail registry must not leak into the main, non-serial walks; verified the in-binary version broke 3 parallel tests): - recovery_rolls_forward_under_finalize_failure: mutation.post_finalize_pre_publisher leaves a real RolledPastExpected sidecar; reopen rolls it forward; the harness WHITE-BOX structural battery holds (additive vs the engine's count-only test). - concurrent_opens_converge_on_pending_sidecar (the #296 cell): an inline park-first rendezvous at recovery.before_roll_forward_publish forces two open sweeps to race one sidecar; the CAS-loser must CONVERGE, not fail the open. Non-vacuous (rendezvous panics if the race never fires); guards 0.7.2 (the failpoint was added with the #296 fix, so it can't run on true 0.7.1). MATRIX.md backlog: fault-seam (#1), #296 cell (#2), Reopen op (#3) now DONE; cross-process + generative-via-object_store remain.

… proptest-state-machine)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76fa48f35c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T10:48:20Z

+        return Err(Finding::Logical(format!(
+            "count Person={p} != model={} (lost-write or dup-@key)",


Stop labeling all Person count drift as dup-key

When the Person row count diverges, this message always includes the dup-@key marker, and classify() allow-lists any logical finding containing that marker as MR-714. In the faulted walk, a lost write (count < model) or a write that committed after returning an error (count > model) will therefore be recorded as a known dup-key issue instead of failing, so the count==model oracle cannot catch the silent-loss cases it is meant to guard.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T10:48:20Z

+        if self.roll(self.cas_conflict_pct) {
+            return Ok(None);


Inject faults on a seam the engine write path uses

In the embedded engine context opened by open_faulted, this override never exercises graph manifest publish failures: the engine publishes __manifest through Lance MergeInsertBuilder/ManifestBatchPublisher, and a repo-wide check of write_text_if_match shows it is used by cluster state/adapter tests rather than the Omnigraph write path. As a result, seeded_op_loop_with_cas_faults can run as another clean walk with no injected CAS conflicts, giving false coverage for the manifest-fault cell.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T10:48:21Z

+        for h in handles {
+            let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery


Surface panics from concurrent actors

Ignoring JoinError here hides any actor panic that does not leave durable corruption behind. For example, a novel panic in load_jsonl, mutate, or optimize could abort one worker early while the post-join structural battery still passes, causing this concurrency test to report success instead of the panic it was supposed to contain and classify.

Useful? React with 👍 / 👎.

greptile-apps · 2026-06-25T10:50:32Z

+        return Err(Finding::Logical(format!(
+            "count Person={p} != model={} (lost-write or dup-@key)",
+            model.persons()
+        )));


Lost Writes Become Known Bugs

When count Person differs from the model for any reason, this message includes the dup-@key token that classify() allow-lists. A lost write after a faulted op, repair, or reopen can therefore be recorded as the known duplicate-key bug instead of failing the DST walk.

Suggested change

return Err(Finding::Logical(format!(

"count Person={p} != model={} (lost-write or dup-@key)",

model.persons()

)));

return Err(Finding::Logical(format!(

"count Person={p} != model={} (lost-write or unexpected duplicate row)",

model.persons()

)));

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-25T10:50:33Z

+| repair | ✅ | walk (`Repair`) |
+| read | ✅ | walk, readshape |
+| branch create/write/merge | 🟡 | `branch_isolation_and_merge` (scenario, not generic walk) |
+| **`open` / recovery sweep** | ❌ | **only a fixture step (`reopen`), never a generated op — the `#296` gap** |


Recovery Coverage Row Is Stale

This row still marks open / recovery as unsampled, while the same PR adds Reopen to the walk and marks the failpoint recovery cells done later in the file. The ledger now contradicts the harness and can mislead future gap-closure work.

Suggested change

| **`open` / recovery sweep** | ❌ | **only a fixture step (`reopen`), never a generated op — the `#296` gap** |

| **`open` / recovery sweep** | ✅ | walk (`Reopen`) + failpoint recovery cells (`dst_recovery`) |

ragnorc · 2026-06-26T17:06:19Z

Superseded by #309, which consolidates this comprehensive harness with #305 (fuzz/S3) and the omnigraph-dst crate extraction into a single tests-only PR retargeted to main. All commits from this branch are included in #309's cumulative diff. The review findings raised here (count==model lost-write masking, swallowed actor panics, FaultAdapter scope wording, the stale MATRIX recovery row) are addressed in #309. The determinism seam stays separate in #304.

ragnorc added 3 commits June 25, 2026 12:21

chore: lock new test-only dev-deps (arrow-array, futures, serde_json,…

76fa48f

… proptest-state-machine)

This was referenced Jun 25, 2026

test(dst): in-tree DST harness for the morphological matrix (iss-784) #300

Closed

test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency #302

Closed

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

ragnorc closed this Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302)#303

test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302)#303
ragnorc wants to merge 3 commits into
mainfrom
dst-harness-harden

ragnorc commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

ragnorc commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return Err(Finding::Logical(format!(
		"count Person={p} != model={} (lost-write or dup-@key)",

		for h in handles {
		let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery

	\| `open` / recovery sweep \| ❌ \| only a fixture step (`reopen`), never a generated op — the `#296` gap \|
	\| `open` / recovery sweep \| ✅ \| walk (`Reopen`) + failpoint recovery cells (`dst_recovery`) \|

Uh oh!

Conversation

ragnorc commented Jun 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The harness (B1–B6, was #300 + #302)

Bugs guarded (5)

Hardening (completeness-critic pass)

Why 0.7.2

Run

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

ragnorc commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ragnorc commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading