test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302
test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302ragnorc wants to merge 8 commits into
Conversation
Decode each fragment's row_id_meta (RowIdMeta::Inline -> read_row_ids -> row_id_range) and assert no two fragments claim overlapping stable row-id ranges. Unlike index_probe (read-surface), this fires at commit time on bad fragment metadata. Reaches Lance only via public surfaces (Snapshot::open -> Dataset::fragments + lance_table::rowids). Adds lance-table as a dev-dep; battery now 5 invariants.
…gate content==model: Model tracks expected Doc.body per key; the oracle reads each live Doc (slug,body), flags a lost UPDATE (stale body) or a value- level dup-@key that count==model passes through. reopen==pre_state: after each walk seed, a fresh handle on the same bytes (recovery sweep runs) must agree with the model — a durability gate over the same battery. Battery now 6 invariants. Adds serde_json dev-dep. no_orphan_edge / branch_isolation / merge_correctness deferred to B3, where the edge + branch ops that exercise them are added.
Adds InsertKnows/DeleteKnows ops (model tracks knows as from->to with @card(0..1); del_person cascades both directions — the RC-1 surface) and the edges==model oracle: raw edge:Knows row count (white-box, sees orphans) AND traversal count must both equal the model, catching a lost node-delete cascade (orphan) or a lost edge write. Harness-surfaced FINDING: a Knows SELF-LOOP is committed to the edge table but is NOT traversable (durable across optimize+reopen; CSR keeps it, so the drop is in Expand). proptest_equivalence misses it (it only checks CSR-vs-indexed mode equivalence, both drop self-loops alike). Captured as regression_self_loop_not_traversable; self-loops excluded from the generic generator so edges==model stays unambiguous. Battery 7, ops 8.
… panic-robust deeper walk
- Repair op (confirm, not force): heals verified maintenance drift, leaves
RC-1's semantic drift for head_eq_manifest.
- B1 correctness fix: the row-id invariant now scans live _rowid
(deletion-vector-aware) instead of comparing raw row_id_meta ranges,
which false-positived after UPDATE+compaction (tombstoned ids in the
range). Swaps lance-table dev-dep for arrow-array + futures.
- classify: the RC-1 drift also surfaces as a later write's precondition
refusing uncovered edge:Knows HEAD>manifest ('ahead of manifest ...
run omnigraph repair') — same root, classified RC-1.
- Walk is now panic-robust (catch_unwind around op + battery): a
substrate crash is classified like any finding. Surfaced a 2nd
harness finding: Lance's inverted (FTS) index builder OOB-panics at
depth when the FTS index on the String @key column is rebuilt after
delete+optimize (non-deterministic — Lance's index-build parallelism
isn't seeded; the harness catches+classifies it robustly).
- Deeper walk: 4 seeds x 25 steps.
Delivers the deferred branch oracles as a focused scenario (not the generic per-op walk, so the reference model stays single-branch): a branch must not observe post-fork main writes (and vice versa), and a merge converges main to the row-level union. Uses db.mutate(<branch>,..) for branch writes and ReadTarget::branch for branch reads. B3 scope note: D1 alphabet now 9 ops + branch scenario; D2 morphology is exercised incidentally by the walk (multi-load, delete, update, optimize) and explicitly in B4. cleanup/apply-schema/overwrite ops deferred as follow-ups (they fork the single-branch model for modest coverage).
New readshape.rs: a richer schema (adds Person.age:I64 for range + numeric aggregates) and 4 morphology builders (single fragment, >=2 fragments, deletion vectors, compacted). Runs 12 read shapes (scan, @key/indexed/non-indexed filter, range, order+limit, count, numeric aggregates, 1-hop, var-hop, negation, zero-match) against every morphology + a forked branch. Oracle: no-crash across the D2xD3 cells, plus morphology-invariant counts (full-scan==live persons, zero-match==0). Standalone (own schema) so the green generative walk is untouched. Vector/FTS/rrf shapes deferred (need vector data + the OOB-panicking inverted index).
…oracle Seeded N-actor concurrent walk over a SHARED graph on an overlapping key space (real parallelism via multi_thread runtime). Under concurrency the sequential model doesn't apply, so the oracle is the interleaving- invariant subset: unique live row-ids, Dataset::validate, HEAD==manifest, and @key uniqueness (new model-free no_duplicate_keys). Reproduces dup-@key (MR-714) GENERICALLY; classify now allow-lists the dup-@key marker as that known bug. A Lance panic inside an actor is contained by tokio::spawn (JoinError), so the post-join battery is the judge. 0 novel violations across runs.
…bset Adds a proptest-state-machine campaign (new statemachine.rs) that drives 1..30 sequences of the CLEAN ops (insert-person/insert-doc/update-doc/ read — the subset with no known open bug) and asserts the engine tracks a pure reference model exactly: count==model, content==model (id→body), and unique live row-ids. Any divergence auto-minimizes to the shortest failing op sequence and persists its seed under proptest-regressions/. Async engine bridged the canonical way — the SUT owns a current-thread Runtime and every call is rt.block_on (plain #[test], no ambient runtime). Adds proptest-state-machine dev-dep; derives Debug on Finding.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f504f0f. Configure here.
| ), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Reopen battery skips panic handling
Medium Severity
The generative walk wraps op::step and in-loop run_battery in catch_unwind and routes known substrate panics through classify_panic, but the new post-reopen durability run_battery is invoked directly. A Lance panic during reopen (for example FTS index rebuild on open) aborts the test instead of being recorded like the same signature during the walk.
Reviewed by Cursor Bugbot for commit f504f0f. Configure here.
| serde_json = { workspace = true } | ||
| serial_test = "3" | ||
| proptest = "1" | ||
| proptest-state-machine = "0.8" |
There was a problem hiding this comment.
This adds proptest-state-machine without updating Cargo.lock. The repo's documented test gate uses cargo test --workspace --locked, so the first locked build of this test target will fail before any DST tests run.
Context Used: AGENTS.md (source)
| for h in handles { | ||
| let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery | ||
| } |
There was a problem hiding this comment.
When a spawned actor panics during load_jsonl, mutate, or optimize, h.await returns a JoinError and this loop discards it. If the panic leaves no persistent drift for the post-join battery to observe, the concurrency test reports success while losing the crash finding it was meant to surface.
| for h in handles { | |
| let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery | |
| } | |
| for h in handles { | |
| h.await.expect("concurrent actor panicked"); | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f504f0f5d7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| serde_json = { workspace = true } | ||
| serial_test = "3" | ||
| proptest = "1" | ||
| proptest-state-machine = "0.8" |
There was a problem hiding this comment.
Commit the lockfile entry for the new test dependency
Adding proptest-state-machine changes dependency resolution, but this commit does not update Cargo.lock (there is no proptest-state-machine entry in the lockfile). The documented CI gate runs cargo test --workspace --locked, so a clean checkout of this commit will fail before building until the lockfile is regenerated and committed.
Useful? React with 👍 / 👎.
| // durability time, so no separate coverage cell. | ||
| drop(db); | ||
| let reopened = backend::reopen(uri).await; | ||
| for (name, res) in run_battery(&reopened, &model).await { |
There was a problem hiding this comment.
Reuse panic classification after reopening
When a walk hits a known durable Lance panic, the in-loop battery is wrapped in catch_unwind and allow-listed, but the post-reopen durability pass calls run_battery directly. If the reopened bytes trigger the same known panic again (for example via index_probe), the DST aborts instead of recording the known finding; wrap this call in the same catch_unwind/classify_panic path used above.
Useful? React with 👍 / 👎.
| })); | ||
| } | ||
| for h in handles { | ||
| let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery |
There was a problem hiding this comment.
If any concurrent actor panics, h.await returns a JoinError, but this line drops it unconditionally. A novel engine panic that leaves the graph structurally valid would make the test pass after the post-join battery, masking exactly the kind of concurrency failure this harness is meant to surface; inspect the join error and classify/re-raise its panic payload instead of ignoring it.
Useful? React with 👍 / 👎.


Stacked on #300 (base
dst-harness-iss784). Tests-only — 0 enginesrcchanges, 10 tests green.Takes the shipped DST spine (#300) to the comprehensive harness: completes the D2/D3 matrix, adds the remaining D4 oracles, shrinking, and real concurrency.
What's added
_rowiduniqueness invariant (the RC-X duplicate-row-address corruption class), deletion-vector-correct.content==model(per-key Doc body) +reopen==pre_statedurability gate.Knowsedges +edges==modelRI oracle;Repairop;branch_isolation+merge_correctnessscenario.readshape.rs: 12 read shapes × 4 fragment morphologies × on-branch (no-crash + count oracles).statemachine.rs: proptest-state-machine auto-shrinking +proptest-regressions/persistence over the clean op subset.@keyMR-714 generically).The generative walk is now panic-robust (
catch_unwind+classify_panic): a substrate crash is classified like any finding, not a suite abort.Two NEW findings surfaced by the harness (beyond the 3 known bugs)
Knowsnot traversable — a self-loop is committed toedge:Knows(raw count 1) but$a knows $breturns 0, durable across optimize+reopen. The CSR build keeps it, so the drop is in Expand.proptest_equivalencemisses it (it only checks CSR-vs-indexed mode equivalence; both drop self-loops alike). Captured asregression_self_loop_not_traversable.lance-index inverted/builder.rs:856. The FTS index is auto-built on theString @keycolumn (node_prop_index_kindmaps String-non-enum → FTS; the@key → BTREEnote indocs/dev/lance.mdis stale). Non-deterministic (Lance's index-build parallelism isn't seeded); caught + classified by the walk.Notes
row_id_meta-range form false-positived after UPDATE+compaction (tombstoned ids legitimately overlap); the shipped form scans live_rowid.arrow-array,futures,serde_json,proptest-state-machine.cfg(dst)(the one engine change) and D5 fan-out /cargo-fuzz.Note
Low Risk
Changes are confined to test code and dev-dependencies; they do not alter production engine behavior, though CI runtime may increase.
Overview
Tests-only expansion of the omnigraph DST harness (no engine
srcchanges). It deepens D4 oracles, widens the op alphabet, and adds several new integration scenarios while keeping known-bug allow-listing so walks can explore past RC-1, RC-X, and dup-@key.The generative walk now runs longer (more seeds/steps), wraps ops and the invariant battery in
catch_unwindwithclassify_panic(Lance FTS OOB, RC-X strings), and gates durability withreopenplus the full battery after drop. New invariants include live_rowiduniqueness,content==model/edges==model,no_duplicate_keys, extended RC-1 manifest matching, and dup-@keyallow-list on logical findings.Modelandop::stepadd Knows insert/delete, Repair, doc-body tracking, and cascade-aware edge state.New modules/tests:
readshape(12 query shapes × 4 fragment morphologies + branch),statemachine(proptest-state-machine shrinking on clean ops), multi-actor concurrent walk, branch isolation + merge, and characterization regressions for self-loop Knows not traversable and existing known bugs.Dev-deps added for tests:
arrow-array,futures,serde_json,proptest-state-machine.Reviewed by Cursor Bugbot for commit f504f0f. Bugbot is set up for automated code reviews on this repo. Configure here.
Greptile Summary
This PR expands the DST harness for deeper engine coverage. The main changes are:
Confidence Score: 4/5
The test harness changes need fixes before merging.
crates/omnigraph/Cargo.toml and crates/omnigraph/tests/dst.rs
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[DST harness] --> B[Sequential walk] A --> C[Concurrent actor walk] A --> D[Read-shape battery] A --> E[State-machine campaign] B --> F[Model invariants] C --> G[Structural invariants] D --> H[Fragment morphology matrix] E --> I[Shrunk clean-op failures]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[DST harness] --> B[Sequential walk] A --> C[Concurrent actor walk] A --> D[Read-shape battery] A --> E[State-machine campaign] B --> F[Model invariants] C --> G[Structural invariants] D --> H[Fragment morphology matrix] E --> I[Shrunk clean-op failures]Reviews (1): Last reviewed commit: "test(dst): B5 — proptest-state-machine s..." | Re-trigger Greptile
Context used: