test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency by ragnorc · Pull Request #302 · ModernRelay/omnigraph

ragnorc · 2026-06-25T09:08:04Z

Stacked on #300 (base dst-harness-iss784). Tests-only — 0 engine src changes, 10 tests green.

Takes the shipped DST spine (#300) to the comprehensive harness: completes the D2/D3 matrix, adds the remaining D4 oracles, shrinking, and real concurrency.

What's added

B1 — live-_rowid uniqueness invariant (the RC-X duplicate-row-address corruption class), deletion-vector-correct.
B2 — content==model (per-key Doc body) + reopen==pre_state durability gate.
B3 — Knows edges + edges==model RI oracle; Repair op; branch_isolation + merge_correctness scenario.
B4 — readshape.rs: 12 read shapes × 4 fragment morphologies × on-branch (no-crash + count oracles).
B5 — statemachine.rs: proptest-state-machine auto-shrinking + proptest-regressions/ persistence over the clean op subset.
B6 — concurrent multi-actor walk + interleaving-invariant oracle (reproduces dup-@key MR-714 generically).

The generative walk is now panic-robust (catch_unwind + classify_panic): a substrate crash is classified like any finding, not a suite abort.

Two NEW findings surfaced by the harness (beyond the 3 known bugs)

Self-loop Knows not traversable — a self-loop is committed to edge:Knows (raw count 1) but $a knows $b returns 0, durable across optimize+reopen. The CSR build keeps it, so the drop is in Expand. proptest_equivalence misses it (it only checks CSR-vs-indexed mode equivalence; both drop self-loops alike). Captured as regression_self_loop_not_traversable.
Lance FTS inverted-builder OOB panic — at depth, rebuilding the FTS index after delete+optimize panics in lance-index inverted/builder.rs:856. The FTS index is auto-built on the String @key column (node_prop_index_kind maps String-non-enum → FTS; the @key → BTREE note in docs/dev/lance.md is stale). Non-deterministic (Lance's index-build parallelism isn't seeded); caught + classified by the walk.

Notes

B1 was corrected mid-development: the initial raw row_id_meta-range form false-positived after UPDATE+compaction (tombstoned ids legitimately overlap); the shipped form scans live _rowid.
New dev-deps (test target only): arrow-array, futures, serde_json, proptest-state-machine.
Out of scope (separate PRs): determinism/cfg(dst) (the one engine change) and D5 fan-out / cargo-fuzz.

Note

Low Risk
Changes are confined to test code and dev-dependencies; they do not alter production engine behavior, though CI runtime may increase.

Overview
Tests-only expansion of the omnigraph DST harness (no engine src changes). It deepens D4 oracles, widens the op alphabet, and adds several new integration scenarios while keeping known-bug allow-listing so walks can explore past RC-1, RC-X, and dup-@key.

The generative walk now runs longer (more seeds/steps), wraps ops and the invariant battery in catch_unwind with classify_panic (Lance FTS OOB, RC-X strings), and gates durability with reopen plus the full battery after drop. New invariants include live _rowid uniqueness, content==model / edges==model, no_duplicate_keys, extended RC-1 manifest matching, and dup-@key allow-list on logical findings. Model and op::step add Knows insert/delete, Repair, doc-body tracking, and cascade-aware edge state.

New modules/tests: readshape (12 query shapes × 4 fragment morphologies + branch), statemachine (proptest-state-machine shrinking on clean ops), multi-actor concurrent walk, branch isolation + merge, and characterization regressions for self-loop Knows not traversable and existing known bugs.

Dev-deps added for tests: arrow-array, futures, serde_json, proptest-state-machine.

^{Reviewed by Cursor Bugbot for commit f504f0f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Greptile Summary

This PR expands the DST harness for deeper engine coverage. The main changes are:

New structural, content, row-id, and edge invariants.
More generated operations, including Knows edges and repair.
Read-shape coverage across fragment morphologies.
Reopen durability checks and panic classification.
Concurrent actor walks and state-machine shrinking tests.

Confidence Score: 4/5

The test harness changes need fixes before merging.

The locked workspace test command can fail because the new dependency is not reflected in the lockfile.
The concurrent DST path can hide actor panics and report success after a crash.

crates/omnigraph/Cargo.toml and crates/omnigraph/tests/dst.rs

Important Files Changed

Filename	Overview
crates/omnigraph/Cargo.toml	Adds test-only dependencies, but the new state-machine dependency also needs the lockfile updated.
crates/omnigraph/tests/dst.rs	Adds regression, concurrency, read-shape, reopen, and panic-handling coverage; the concurrent join path currently hides actor panics.
crates/omnigraph/tests/dst/invariants.rs	Adds panic classification plus row-id, key, and structural invariant checks.
crates/omnigraph/tests/dst/model.rs	Extends the reference model with Doc bodies and Knows edges.
crates/omnigraph/tests/dst/op.rs	Adds generated Knows operations and repair to the DST walk.
crates/omnigraph/tests/dst/readshape.rs	Adds read-shape queries over single-fragment, multi-fragment, deleted, optimized, and branch cases.
crates/omnigraph/tests/dst/statemachine.rs	Adds a proptest-state-machine campaign over the clean operation subset.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
  A[DST harness] --> B[Sequential walk]
  A --> C[Concurrent actor walk]
  A --> D[Read-shape battery]
  A --> E[State-machine campaign]
  B --> F[Model invariants]
  C --> G[Structural invariants]
  D --> H[Fragment morphology matrix]
  E --> I[Shrunk clean-op failures]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
  A[DST harness] --> B[Sequential walk]
  A --> C[Concurrent actor walk]
  A --> D[Read-shape battery]
  A --> E[State-machine campaign]
  B --> F[Model invariants]
  C --> G[Structural invariants]
  D --> H[Fragment morphology matrix]
  E --> I[Shrunk clean-op failures]

_{Reviews (1): Last reviewed commit: "test(dst): B5 — proptest-state-machine s..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

Context used:

Context used - AGENTS.md (source)
Context used - CLAUDE.md (source)

Decode each fragment's row_id_meta (RowIdMeta::Inline -> read_row_ids -> row_id_range) and assert no two fragments claim overlapping stable row-id ranges. Unlike index_probe (read-surface), this fires at commit time on bad fragment metadata. Reaches Lance only via public surfaces (Snapshot::open -> Dataset::fragments + lance_table::rowids). Adds lance-table as a dev-dep; battery now 5 invariants.

@key

…gate content==model: Model tracks expected Doc.body per key; the oracle reads each live Doc (slug,body), flags a lost UPDATE (stale body) or a value- level dup-@key that count==model passes through. reopen==pre_state: after each walk seed, a fresh handle on the same bytes (recovery sweep runs) must agree with the model — a durability gate over the same battery. Battery now 6 invariants. Adds serde_json dev-dep. no_orphan_edge / branch_isolation / merge_correctness deferred to B3, where the edge + branch ops that exercise them are added.

Adds InsertKnows/DeleteKnows ops (model tracks knows as from->to with @card(0..1); del_person cascades both directions — the RC-1 surface) and the edges==model oracle: raw edge:Knows row count (white-box, sees orphans) AND traversal count must both equal the model, catching a lost node-delete cascade (orphan) or a lost edge write. Harness-surfaced FINDING: a Knows SELF-LOOP is committed to the edge table but is NOT traversable (durable across optimize+reopen; CSR keeps it, so the drop is in Expand). proptest_equivalence misses it (it only checks CSR-vs-indexed mode equivalence, both drop self-loops alike). Captured as regression_self_loop_not_traversable; self-loops excluded from the generic generator so edges==model stays unambiguous. Battery 7, ops 8.

@key

… panic-robust deeper walk - Repair op (confirm, not force): heals verified maintenance drift, leaves RC-1's semantic drift for head_eq_manifest. - B1 correctness fix: the row-id invariant now scans live _rowid (deletion-vector-aware) instead of comparing raw row_id_meta ranges, which false-positived after UPDATE+compaction (tombstoned ids in the range). Swaps lance-table dev-dep for arrow-array + futures. - classify: the RC-1 drift also surfaces as a later write's precondition refusing uncovered edge:Knows HEAD>manifest ('ahead of manifest ... run omnigraph repair') — same root, classified RC-1. - Walk is now panic-robust (catch_unwind around op + battery): a substrate crash is classified like any finding. Surfaced a 2nd harness finding: Lance's inverted (FTS) index builder OOB-panics at depth when the FTS index on the String @key column is rebuilt after delete+optimize (non-deterministic — Lance's index-build parallelism isn't seeded; the harness catches+classifies it robustly). - Deeper walk: 4 seeds x 25 steps.

Delivers the deferred branch oracles as a focused scenario (not the generic per-op walk, so the reference model stays single-branch): a branch must not observe post-fork main writes (and vice versa), and a merge converges main to the row-level union. Uses db.mutate(<branch>,..) for branch writes and ReadTarget::branch for branch reads. B3 scope note: D1 alphabet now 9 ops + branch scenario; D2 morphology is exercised incidentally by the walk (multi-load, delete, update, optimize) and explicitly in B4. cleanup/apply-schema/overwrite ops deferred as follow-ups (they fork the single-branch model for modest coverage).

New readshape.rs: a richer schema (adds Person.age:I64 for range + numeric aggregates) and 4 morphology builders (single fragment, >=2 fragments, deletion vectors, compacted). Runs 12 read shapes (scan, @key/indexed/non-indexed filter, range, order+limit, count, numeric aggregates, 1-hop, var-hop, negation, zero-match) against every morphology + a forked branch. Oracle: no-crash across the D2xD3 cells, plus morphology-invariant counts (full-scan==live persons, zero-match==0). Standalone (own schema) so the green generative walk is untouched. Vector/FTS/rrf shapes deferred (need vector data + the OOB-panicking inverted index).

@key

…oracle Seeded N-actor concurrent walk over a SHARED graph on an overlapping key space (real parallelism via multi_thread runtime). Under concurrency the sequential model doesn't apply, so the oracle is the interleaving- invariant subset: unique live row-ids, Dataset::validate, HEAD==manifest, and @key uniqueness (new model-free no_duplicate_keys). Reproduces dup-@key (MR-714) GENERICALLY; classify now allow-lists the dup-@key marker as that known bug. A Lance panic inside an actor is contained by tokio::spawn (JoinError), so the post-join battery is the judge. 0 novel violations across runs.

…bset Adds a proptest-state-machine campaign (new statemachine.rs) that drives 1..30 sequences of the CLEAN ops (insert-person/insert-doc/update-doc/ read — the subset with no known open bug) and asserts the engine tracks a pure reference model exactly: count==model, content==model (id→body), and unique live row-ids. Any divergence auto-minimizes to the shortest failing op sequence and persists its seed under proptest-regressions/. Async engine bridged the canonical way — the SUT owns a current-thread Runtime and every call is rt.block_on (plain #[test], no ambient runtime). Adds proptest-state-machine dev-dep; derives Debug on Finding.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f504f0f. Configure here.}

cursor · 2026-06-25T09:09:55Z

+                    ),
+                }
+            }
+        }


Reopen battery skips panic handling

Medium Severity

The generative walk wraps op::step and in-loop run_battery in catch_unwind and routes known substrate panics through classify_panic, but the new post-reopen durability run_battery is invoked directly. A Lance panic during reopen (for example FTS index rebuild on open) aborts the test instead of being recorded like the same signature during the walk.

^{Reviewed by Cursor Bugbot for commit f504f0f. Configure here.}

greptile-apps · 2026-06-25T09:11:53Z

+serde_json = { workspace = true }
 serial_test = "3"
 proptest = "1"
+proptest-state-machine = "0.8"


Locked Test Build Breaks

This adds proptest-state-machine without updating Cargo.lock. The repo's documented test gate uses cargo test --workspace --locked, so the first locked build of this test target will fail before any DST tests run.

Context Used: AGENTS.md (source)

greptile-apps · 2026-06-25T09:11:54Z

+        for h in handles {
+            let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery
+        }


Actor Panics Are Hidden

When a spawned actor panics during load_jsonl, mutate, or optimize, h.await returns a JoinError and this loop discards it. If the panic leaves no persistent drift for the post-join battery to observe, the concurrency test reports success while losing the crash finding it was meant to surface.

Suggested change

for h in handles {

let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery

}

for h in handles {

h.await.expect("concurrent actor panicked");

}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f504f0f5d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T09:11:55Z

+serde_json = { workspace = true }
 serial_test = "3"
 proptest = "1"
+proptest-state-machine = "0.8"


Commit the lockfile entry for the new test dependency

Adding proptest-state-machine changes dependency resolution, but this commit does not update Cargo.lock (there is no proptest-state-machine entry in the lockfile). The documented CI gate runs cargo test --workspace --locked, so a clean checkout of this commit will fail before building until the lockfile is regenerated and committed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T09:11:55Z

+        // durability time, so no separate coverage cell.
+        drop(db);
+        let reopened = backend::reopen(uri).await;
+        for (name, res) in run_battery(&reopened, &model).await {


Reuse panic classification after reopening

When a walk hits a known durable Lance panic, the in-loop battery is wrapped in catch_unwind and allow-listed, but the post-reopen durability pass calls run_battery directly. If the reopened bytes trigger the same known panic again (for example via index_probe), the DST aborts instead of recording the known finding; wrap this call in the same catch_unwind/classify_panic path used above.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-25T09:11:55Z

+            }));
+        }
+        for h in handles {
+            let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery


Do not discard actor panics

If any concurrent actor panics, h.await returns a JoinError, but this line drops it unconditionally. A novel engine panic that leaves the graph structurally valid would make the test pass after the post-join battery, masking exactly the kind of concurrency failure this harness is meant to surface; inspect the join error and classify/re-raise its panic payload instead of ignoring it.

Useful? React with 👍 / 👎.

ragnorc · 2026-06-25T10:44:47Z

Superseded by #303, which consolidates the full DST harness onto current main (0.7.2) in one PR and adds the completeness-critic hardening (coverage ledger, the #296 concurrent-recovery cell, and the fault-seam closure). No content lost — #303 contains everything here, rebased onto 0.7.2.

ragnorc added 8 commits June 24, 2026 19:58

cursor Bot reviewed Jun 25, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

ragnorc mentioned this pull request Jun 25, 2026

test(dst): comprehensive DST harness on 0.7.2 + completeness-critic hardening (supersedes #300, #302) #303

Closed

ragnorc closed this Jun 25, 2026

ragnorc deleted the dst-harness-pr-b branch June 25, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302

test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302
ragnorc wants to merge 8 commits into
dst-harness-iss784from
dst-harness-pr-b

ragnorc commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

greptile-apps Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

ragnorc commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ragnorc commented Jun 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's added

Two NEW findings surfaced by the harness (beyond the 3 known bugs)

Notes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 25, 2026

Choose a reason for hiding this comment

Reopen battery skips panic handling

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

ragnorc commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ragnorc commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading