Skip to content

test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302)#303

Closed
ragnorc wants to merge 3 commits into
mainfrom
dst-harness-harden
Closed

test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302)#303
ragnorc wants to merge 3 commits into
mainfrom
dst-harness-harden

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Supersedes #300 and #302 — consolidates the full DST harness onto current main (0.7.2) in one PR, plus the completeness-critic hardening pass. Tests-only (no engine src changes).

The harness (B1–B6, was #300 + #302)

A seeded, model-based DST harness over the morphological matrix (D1 ops · D2 fragment morphology · D3 read shapes · D4 oracles · D5 context):

  • White-box invariant battery after every op: HEAD==manifest, Dataset::validate, unique live _rowid (RC-X corruption class, deletion-vector-correct), index-probe, count==model, content==model, edges==model (RI), @key-unique.
  • Generative walk (panic-robust via catch_unwind + classify_panic), branch isolation/merge scenario, read-shape battery × 4 morphologies, concurrent multi-actor walk, and a proptest-state-machine shrinking campaign over the clean op subset.

Bugs guarded (5)

Reproduced/pinned as characterization tests, all still present on 0.7.2 (verified):

layer guard
RC-X / Lance #7230 (scalar-BTREE dup row addr) Lance (fixed v8.0.0) regression_rc_x_… + index-probe
RC-1 stale-view manifest CAS on delete combos Omnigraph regression_rc1_… + HEAD==manifest
dup-@key (MR-714) Omnigraph regression_dup_key_… + concurrent walk
self-loop Knows not traversable (NEW) Omnigraph (Expand) regression_self_loop_… + edges==model
Lance FTS inverted-builder OOB panic (NEW) Lance caught + classified by the walk

Hardening (completeness-critic pass)

  • dst/MATRIX.md — a coverage ledger: D1–D5 sampled/unsampled cells, the hidden dimensions found only after a miss, and why exhaustive sampling is impossible. A living artifact, re-run when an external bug slips through.
  • Reopen op — the recovery sweep is now a first-class walk op.
  • Critic finding: the Phase-2 FaultAdapter wraps StorageAdapter::write_text_if_match, which is off the manifest-publish path (a Lance MergeInsertBuilder CAS) — so it couldn't induce a recovery sidecar.
  • Fault-seam closed via a --features failpoints variant in its own binary (tests/dst_recovery.rs): induces a real RolledPastExpected sidecar, and a rendezvous-forced concurrent-open race (the #296 blind spot) asserting the CAS-loser converges. Run: cargo test -p omnigraph-engine --features failpoints --test dst_recovery.

Why 0.7.2

The only fix between 0.7.1 and 0.7.2 (#296, recovery roll-forward convergence) fixes a different recovery bug than any the harness catches — confirmed by running the harness against 0.7.2 (all 5 guards stay green). The #296 cell is now sampled so its class can't slip again.

Run

  • cargo test -p omnigraph-engine --test dst (10 tests)
  • cargo test -p omnigraph-engine --features failpoints --test dst_recovery (2 cells)

New test-only dev-deps: arrow-array, futures, serde_json, async-trait, proptest-state-machine.


Note

Low Risk
Test-only additions and dev-dependencies; no production engine code paths change.

Overview
Tests-only — adds a deterministic-simulation / morphological-matrix (DST) harness under crates/omnigraph/tests/dst/ with entrypoints tests/dst.rs and tests/dst_recovery.rs (no engine src changes).

The harness drives the real omnigraph-engine through a seeded op alphabet, a reference model, and a white-box invariant battery after each step (HEAD==manifest, Dataset::validate, unique live _rowid, index probe, count/content/edges vs model, @key checks). Findings are classified as known open bugs (allow-listed) vs novel (fail). Coverage is tracked; dst/MATRIX.md documents sampled vs gap matrix dimensions and critic findings (e.g. FaultAdapter CAS seam vs manifest publish path).

New scenarios: named characterization regressions (RC-1, RC-X, dup-@key, self-loop traversal, etc.); generative walks with optional FaultAdapter CAS faults, mid-walk Reopen, and panic containment; concurrent multi-actor structural walk; branch isolation/merge; read-shape battery × table morphologies; proptest-state-machine shrinking on clean ops.

Failpoint binary (--features failpoints --test dst_recovery): real pending recovery sidecars and deterministic concurrent-open race (#296), isolated from parallel main DST runs.

Dev-deps: proptest-state-machine, plus arrow-array, futures, serde_json, async-trait for the harness.

Reviewed by Cursor Bugbot for commit 76fa48f. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR adds a test-only DST harness for Omnigraph. The main changes are:

  • Seeded operation walks with model and invariant checks.
  • Regression guards for known Lance and Omnigraph bugs.
  • Read-shape and morphology coverage tests.
  • Failpoint-gated recovery race tests.
  • A coverage ledger for sampled and deferred cells.

Confidence Score: 4/5

The DST harness needs a small correctness fix before merging.

  • A generic Person count mismatch can be classified as the known duplicate-key bug.
  • That can hide lost-write failures in the new walk.
  • The new build and failpoint test surfaces otherwise look consistent.

crates/omnigraph/tests/dst/model.rs

Important Files Changed

Filename Overview
crates/omnigraph/tests/dst.rs Adds the main DST test binary with seeded walks, regression guards, read-shape coverage, and durability checks.
crates/omnigraph/tests/dst/model.rs Adds the reference model and invariant checks, with one count-mismatch message that can be over-allow-listed.
crates/omnigraph/tests/dst/invariants.rs Adds known-bug classification and the invariant battery used by the walk.
crates/omnigraph/tests/dst/op.rs Defines the generated operation alphabet and model update rules.
crates/omnigraph/tests/dst/recovery_walk.rs Adds failpoint-driven recovery and concurrent-open race coverage.
crates/omnigraph/tests/dst/statemachine.rs Adds a proptest-state-machine campaign over the clean operation subset.
crates/omnigraph/tests/dst/MATRIX.md Documents DST coverage and remaining gaps, with one stale recovery row.
crates/omnigraph/Cargo.toml Adds the test-support dependencies used by the new harness.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
  A[Seeded DST walk] --> B[Run engine op]
  B --> C{Op succeeds?}
  C -->|yes| D[Update model]
  C -->|no| E[Classify error]
  D --> F[Run invariant battery]
  E --> F
  F --> G{Finding?}
  G -->|known| H[Record known bug]
  G -->|novel| I[Fail test]
  G -->|none| A
  A --> J[Reopen durability check]
  J --> F
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
  A[Seeded DST walk] --> B[Run engine op]
  B --> C{Op succeeds?}
  C -->|yes| D[Update model]
  C -->|no| E[Classify error]
  D --> F[Run invariant battery]
  E --> F
  F --> G{Finding?}
  G -->|known| H[Record known bug]
  G -->|novel| I[Fail test]
  G -->|none| A
  A --> J[Reopen durability check]
  J --> F
Loading

Fix All in Claude Code

Reviews (1): Last reviewed commit: "chore: lock new test-only dev-deps (arro..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Context used - AGENTS.md (source)
  • Context used - CLAUDE.md (source)

ragnorc added 3 commits June 25, 2026 12:21
…, critic findings)

Port: brings the full harness (B1-B6) onto current main (0.7.2). Verified
all 5 bug guards still reproduce on 0.7.2 — its only fix (#296 recovery
roll-forward convergence) addresses a DIFFERENT bug than any the harness
catches. Adds test-only dev-deps (arrow-array, futures, serde_json,
async-trait, proptest-state-machine).

Harden (completeness-critic pass):
- MATRIX.md — the coverage ledger: enumerates D1-D5 sampled/unsampled
  cells, names the HIDDEN dimensions found only after a miss, and explains
  why exhaustive sampling is impossible (path-spaces / non-enumerable
  schedules / open-world model). Prioritized gap backlog.
- Reopen op — open/recovery sweep is now a first-class walk op (drop +
  reopen mid-walk), sampled across varied table states, not only at the
  end. ops 10/10, 0 novel violations.
- Critic finding #1 (#296 cell): concurrent open / >=2 recovery sweeps on
  one sidecar is unsampled — the named blind spot that let #296 through.
- Critic finding #2 (fault seam): the FaultAdapter wraps
  StorageAdapter::write_text_if_match, but the __manifest publish is a
  Lance MergeInsertBuilder CAS (publisher.rs:377) that never flows through
  it — verified empirically (cas_pct=100 write succeeds, no sidecar). So
  Phase-2 'manifest CAS fault injection' is off the hot path; real
  injection needs failpoints or a Lance object_store wrapper. This is the
  prerequisite for closing the #296 cell.
The StorageAdapter FaultAdapter is off the manifest-publish path, so it
can't induce a recovery sidecar. Failpoints can. New tests/dst_recovery.rs
(own binary — the process-global fail registry must not leak into the
main, non-serial walks; verified the in-binary version broke 3 parallel
tests):

- recovery_rolls_forward_under_finalize_failure: mutation.post_finalize_pre_publisher
  leaves a real RolledPastExpected sidecar; reopen rolls it forward; the
  harness WHITE-BOX structural battery holds (additive vs the engine's
  count-only test).
- concurrent_opens_converge_on_pending_sidecar (the #296 cell): an inline
  park-first rendezvous at recovery.before_roll_forward_publish forces two
  open sweeps to race one sidecar; the CAS-loser must CONVERGE, not fail the
  open. Non-vacuous (rendezvous panics if the race never fires); guards
  0.7.2 (the failpoint was added with the #296 fix, so it can't run on true
  0.7.1).

MATRIX.md backlog: fault-seam (#1), #296 cell (#2), Reopen op (#3) now
DONE; cross-process + generative-via-object_store remain.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76fa48f35c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +119 to +120
return Err(Finding::Logical(format!(
"count Person={p} != model={} (lost-write or dup-@key)",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop labeling all Person count drift as dup-key

When the Person row count diverges, this message always includes the dup-@key marker, and classify() allow-lists any logical finding containing that marker as MR-714. In the faulted walk, a lost write (count < model) or a write that committed after returning an error (count > model) will therefore be recorded as a known dup-key issue instead of failing, so the count==model oracle cannot catch the silent-loss cases it is meant to guard.

Useful? React with 👍 / 👎.

Comment on lines +85 to +86
if self.roll(self.cas_conflict_pct) {
return Ok(None);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Inject faults on a seam the engine write path uses

In the embedded engine context opened by open_faulted, this override never exercises graph manifest publish failures: the engine publishes __manifest through Lance MergeInsertBuilder/ManifestBatchPublisher, and a repo-wide check of write_text_if_match shows it is used by cluster state/adapter tests rather than the Omnigraph write path. As a result, seeded_op_loop_with_cas_faults can run as another clean walk with no injected CAS conflicts, giving false coverage for the manifest-fault cell.

Useful? React with 👍 / 👎.

Comment on lines +286 to +287
for h in handles {
let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface panics from concurrent actors

Ignoring JoinError here hides any actor panic that does not leave durable corruption behind. For example, a novel panic in load_jsonl, mutate, or optimize could abort one worker early while the post-join structural battery still passes, causing this concurrency test to report success instead of the panic it was supposed to contain and classify.

Useful? React with 👍 / 👎.

Comment on lines +119 to +122
return Err(Finding::Logical(format!(
"count Person={p} != model={} (lost-write or dup-@key)",
model.persons()
)));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Lost Writes Become Known Bugs

When count Person differs from the model for any reason, this message includes the dup-@key token that classify() allow-lists. A lost write after a faulted op, repair, or reopen can therefore be recorded as the known duplicate-key bug instead of failing the DST walk.

Suggested change
return Err(Finding::Logical(format!(
"count Person={p} != model={} (lost-write or dup-@key)",
model.persons()
)));
return Err(Finding::Logical(format!(
"count Person={p} != model={} (lost-write or unexpected duplicate row)",
model.persons()
)));

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

| repair | ✅ | walk (`Repair`) |
| read | ✅ | walk, readshape |
| branch create/write/merge | 🟡 | `branch_isolation_and_merge` (scenario, not generic walk) |
| **`open` / recovery sweep** | ❌ | **only a fixture step (`reopen`), never a generated op — the `#296` gap** |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Recovery Coverage Row Is Stale

This row still marks open / recovery as unsampled, while the same PR adds Reopen to the walk and marks the failpoint recovery cells done later in the file. The ledger now contradicts the harness and can mislead future gap-closure work.

Suggested change
| **`open` / recovery sweep** | | **only a fixture step (`reopen`), never a generated op — the `#296` gap** |
| **`open` / recovery sweep** | | walk (`Reopen`) + failpoint recovery cells (`dst_recovery`) |

Fix in Claude Code

@ragnorc

ragnorc commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #309, which consolidates this comprehensive harness with #305 (fuzz/S3) and the omnigraph-dst crate extraction into a single tests-only PR retargeted to main. All commits from this branch are included in #309's cumulative diff. The review findings raised here (count==model lost-write masking, swallowed actor panics, FaultAdapter scope wording, the stale MATRIX recovery row) are addressed in #309. The determinism seam stays separate in #304.

@ragnorc ragnorc closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant