Skip to content

test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302

Closed
ragnorc wants to merge 8 commits into
dst-harness-iss784from
dst-harness-pr-b
Closed

test(dst): comprehensive DST harness — D2/D3 matrix, more oracles, shrinking, concurrency#302
ragnorc wants to merge 8 commits into
dst-harness-iss784from
dst-harness-pr-b

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Stacked on #300 (base dst-harness-iss784). Tests-only — 0 engine src changes, 10 tests green.

Takes the shipped DST spine (#300) to the comprehensive harness: completes the D2/D3 matrix, adds the remaining D4 oracles, shrinking, and real concurrency.

What's added

  • B1 — live-_rowid uniqueness invariant (the RC-X duplicate-row-address corruption class), deletion-vector-correct.
  • B2content==model (per-key Doc body) + reopen==pre_state durability gate.
  • B3Knows edges + edges==model RI oracle; Repair op; branch_isolation + merge_correctness scenario.
  • B4readshape.rs: 12 read shapes × 4 fragment morphologies × on-branch (no-crash + count oracles).
  • B5statemachine.rs: proptest-state-machine auto-shrinking + proptest-regressions/ persistence over the clean op subset.
  • B6 — concurrent multi-actor walk + interleaving-invariant oracle (reproduces dup-@key MR-714 generically).

The generative walk is now panic-robust (catch_unwind + classify_panic): a substrate crash is classified like any finding, not a suite abort.

Two NEW findings surfaced by the harness (beyond the 3 known bugs)

  1. Self-loop Knows not traversable — a self-loop is committed to edge:Knows (raw count 1) but $a knows $b returns 0, durable across optimize+reopen. The CSR build keeps it, so the drop is in Expand. proptest_equivalence misses it (it only checks CSR-vs-indexed mode equivalence; both drop self-loops alike). Captured as regression_self_loop_not_traversable.
  2. Lance FTS inverted-builder OOB panic — at depth, rebuilding the FTS index after delete+optimize panics in lance-index inverted/builder.rs:856. The FTS index is auto-built on the String @key column (node_prop_index_kind maps String-non-enum → FTS; the @key → BTREE note in docs/dev/lance.md is stale). Non-deterministic (Lance's index-build parallelism isn't seeded); caught + classified by the walk.

Notes

  • B1 was corrected mid-development: the initial raw row_id_meta-range form false-positived after UPDATE+compaction (tombstoned ids legitimately overlap); the shipped form scans live _rowid.
  • New dev-deps (test target only): arrow-array, futures, serde_json, proptest-state-machine.
  • Out of scope (separate PRs): determinism/cfg(dst) (the one engine change) and D5 fan-out / cargo-fuzz.

Note

Low Risk
Changes are confined to test code and dev-dependencies; they do not alter production engine behavior, though CI runtime may increase.

Overview
Tests-only expansion of the omnigraph DST harness (no engine src changes). It deepens D4 oracles, widens the op alphabet, and adds several new integration scenarios while keeping known-bug allow-listing so walks can explore past RC-1, RC-X, and dup-@key.

The generative walk now runs longer (more seeds/steps), wraps ops and the invariant battery in catch_unwind with classify_panic (Lance FTS OOB, RC-X strings), and gates durability with reopen plus the full battery after drop. New invariants include live _rowid uniqueness, content==model / edges==model, no_duplicate_keys, extended RC-1 manifest matching, and dup-@key allow-list on logical findings. Model and op::step add Knows insert/delete, Repair, doc-body tracking, and cascade-aware edge state.

New modules/tests: readshape (12 query shapes × 4 fragment morphologies + branch), statemachine (proptest-state-machine shrinking on clean ops), multi-actor concurrent walk, branch isolation + merge, and characterization regressions for self-loop Knows not traversable and existing known bugs.

Dev-deps added for tests: arrow-array, futures, serde_json, proptest-state-machine.

Reviewed by Cursor Bugbot for commit f504f0f. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

This PR expands the DST harness for deeper engine coverage. The main changes are:

  • New structural, content, row-id, and edge invariants.
  • More generated operations, including Knows edges and repair.
  • Read-shape coverage across fragment morphologies.
  • Reopen durability checks and panic classification.
  • Concurrent actor walks and state-machine shrinking tests.

Confidence Score: 4/5

The test harness changes need fixes before merging.

  • The locked workspace test command can fail because the new dependency is not reflected in the lockfile.
  • The concurrent DST path can hide actor panics and report success after a crash.

crates/omnigraph/Cargo.toml and crates/omnigraph/tests/dst.rs

Important Files Changed

Filename Overview
crates/omnigraph/Cargo.toml Adds test-only dependencies, but the new state-machine dependency also needs the lockfile updated.
crates/omnigraph/tests/dst.rs Adds regression, concurrency, read-shape, reopen, and panic-handling coverage; the concurrent join path currently hides actor panics.
crates/omnigraph/tests/dst/invariants.rs Adds panic classification plus row-id, key, and structural invariant checks.
crates/omnigraph/tests/dst/model.rs Extends the reference model with Doc bodies and Knows edges.
crates/omnigraph/tests/dst/op.rs Adds generated Knows operations and repair to the DST walk.
crates/omnigraph/tests/dst/readshape.rs Adds read-shape queries over single-fragment, multi-fragment, deleted, optimized, and branch cases.
crates/omnigraph/tests/dst/statemachine.rs Adds a proptest-state-machine campaign over the clean operation subset.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
  A[DST harness] --> B[Sequential walk]
  A --> C[Concurrent actor walk]
  A --> D[Read-shape battery]
  A --> E[State-machine campaign]
  B --> F[Model invariants]
  C --> G[Structural invariants]
  D --> H[Fragment morphology matrix]
  E --> I[Shrunk clean-op failures]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
  A[DST harness] --> B[Sequential walk]
  A --> C[Concurrent actor walk]
  A --> D[Read-shape battery]
  A --> E[State-machine campaign]
  B --> F[Model invariants]
  C --> G[Structural invariants]
  D --> H[Fragment morphology matrix]
  E --> I[Shrunk clean-op failures]
Loading

Fix All in Claude Code

Reviews (1): Last reviewed commit: "test(dst): B5 — proptest-state-machine s..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Context used - AGENTS.md (source)
  • Context used - CLAUDE.md (source)

ragnorc added 8 commits June 24, 2026 19:58
Decode each fragment's row_id_meta (RowIdMeta::Inline -> read_row_ids ->
row_id_range) and assert no two fragments claim overlapping stable
row-id ranges. Unlike index_probe (read-surface), this fires at commit
time on bad fragment metadata. Reaches Lance only via public surfaces
(Snapshot::open -> Dataset::fragments + lance_table::rowids). Adds
lance-table as a dev-dep; battery now 5 invariants.
…gate

content==model: Model tracks expected Doc.body per key; the oracle reads
each live Doc (slug,body), flags a lost UPDATE (stale body) or a value-
level dup-@key that count==model passes through. reopen==pre_state: after
each walk seed, a fresh handle on the same bytes (recovery sweep runs)
must agree with the model — a durability gate over the same battery.
Battery now 6 invariants. Adds serde_json dev-dep.

no_orphan_edge / branch_isolation / merge_correctness deferred to B3,
where the edge + branch ops that exercise them are added.
Adds InsertKnows/DeleteKnows ops (model tracks knows as from->to with
@card(0..1); del_person cascades both directions — the RC-1 surface) and
the edges==model oracle: raw edge:Knows row count (white-box, sees
orphans) AND traversal count must both equal the model, catching a lost
node-delete cascade (orphan) or a lost edge write.

Harness-surfaced FINDING: a Knows SELF-LOOP is committed to the edge
table but is NOT traversable (durable across optimize+reopen; CSR keeps
it, so the drop is in Expand). proptest_equivalence misses it (it only
checks CSR-vs-indexed mode equivalence, both drop self-loops alike).
Captured as regression_self_loop_not_traversable; self-loops excluded
from the generic generator so edges==model stays unambiguous. Battery 7,
ops 8.
… panic-robust deeper walk

- Repair op (confirm, not force): heals verified maintenance drift, leaves
  RC-1's semantic drift for head_eq_manifest.
- B1 correctness fix: the row-id invariant now scans live _rowid
  (deletion-vector-aware) instead of comparing raw row_id_meta ranges,
  which false-positived after UPDATE+compaction (tombstoned ids in the
  range). Swaps lance-table dev-dep for arrow-array + futures.
- classify: the RC-1 drift also surfaces as a later write's precondition
  refusing uncovered edge:Knows HEAD>manifest ('ahead of manifest ...
  run omnigraph repair') — same root, classified RC-1.
- Walk is now panic-robust (catch_unwind around op + battery): a
  substrate crash is classified like any finding. Surfaced a 2nd
  harness finding: Lance's inverted (FTS) index builder OOB-panics at
  depth when the FTS index on the String @key column is rebuilt after
  delete+optimize (non-deterministic — Lance's index-build parallelism
  isn't seeded; the harness catches+classifies it robustly).
- Deeper walk: 4 seeds x 25 steps.
Delivers the deferred branch oracles as a focused scenario (not the
generic per-op walk, so the reference model stays single-branch): a
branch must not observe post-fork main writes (and vice versa), and a
merge converges main to the row-level union. Uses db.mutate(<branch>,..)
for branch writes and ReadTarget::branch for branch reads.

B3 scope note: D1 alphabet now 9 ops + branch scenario; D2 morphology is
exercised incidentally by the walk (multi-load, delete, update, optimize)
and explicitly in B4. cleanup/apply-schema/overwrite ops deferred as
follow-ups (they fork the single-branch model for modest coverage).
New readshape.rs: a richer schema (adds Person.age:I64 for range +
numeric aggregates) and 4 morphology builders (single fragment, >=2
fragments, deletion vectors, compacted). Runs 12 read shapes (scan,
@key/indexed/non-indexed filter, range, order+limit, count, numeric
aggregates, 1-hop, var-hop, negation, zero-match) against every
morphology + a forked branch. Oracle: no-crash across the D2xD3 cells,
plus morphology-invariant counts (full-scan==live persons, zero-match==0).
Standalone (own schema) so the green generative walk is untouched.
Vector/FTS/rrf shapes deferred (need vector data + the OOB-panicking
inverted index).
…oracle

Seeded N-actor concurrent walk over a SHARED graph on an overlapping key
space (real parallelism via multi_thread runtime). Under concurrency the
sequential model doesn't apply, so the oracle is the interleaving-
invariant subset: unique live row-ids, Dataset::validate, HEAD==manifest,
and @key uniqueness (new model-free no_duplicate_keys). Reproduces
dup-@key (MR-714) GENERICALLY; classify now allow-lists the dup-@key
marker as that known bug. A Lance panic inside an actor is contained by
tokio::spawn (JoinError), so the post-join battery is the judge. 0 novel
violations across runs.
…bset

Adds a proptest-state-machine campaign (new statemachine.rs) that drives
1..30 sequences of the CLEAN ops (insert-person/insert-doc/update-doc/
read — the subset with no known open bug) and asserts the engine tracks a
pure reference model exactly: count==model, content==model (id→body), and
unique live row-ids. Any divergence auto-minimizes to the shortest failing
op sequence and persists its seed under proptest-regressions/. Async engine
bridged the canonical way — the SUT owns a current-thread Runtime and every
call is rt.block_on (plain #[test], no ambient runtime). Adds
proptest-state-machine dev-dep; derives Debug on Finding.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f504f0f. Configure here.

),
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reopen battery skips panic handling

Medium Severity

The generative walk wraps op::step and in-loop run_battery in catch_unwind and routes known substrate panics through classify_panic, but the new post-reopen durability run_battery is invoked directly. A Lance panic during reopen (for example FTS index rebuild on open) aborts the test instead of being recorded like the same signature during the walk.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f504f0f. Configure here.

serde_json = { workspace = true }
serial_test = "3"
proptest = "1"
proptest-state-machine = "0.8"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Locked Test Build Breaks

This adds proptest-state-machine without updating Cargo.lock. The repo's documented test gate uses cargo test --workspace --locked, so the first locked build of this test target will fail before any DST tests run.

Context Used: AGENTS.md (source)

Fix in Claude Code

Comment on lines +283 to +285
for h in handles {
let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Actor Panics Are Hidden

When a spawned actor panics during load_jsonl, mutate, or optimize, h.await returns a JoinError and this loop discards it. If the panic leaves no persistent drift for the post-join battery to observe, the concurrency test reports success while losing the crash finding it was meant to surface.

Suggested change
for h in handles {
let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery
}
for h in handles {
h.await.expect("concurrent actor panicked");
}

Fix in Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f504f0f5d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

serde_json = { workspace = true }
serial_test = "3"
proptest = "1"
proptest-state-machine = "0.8"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Commit the lockfile entry for the new test dependency

Adding proptest-state-machine changes dependency resolution, but this commit does not update Cargo.lock (there is no proptest-state-machine entry in the lockfile). The documented CI gate runs cargo test --workspace --locked, so a clean checkout of this commit will fail before building until the lockfile is regenerated and committed.

Useful? React with 👍 / 👎.

// durability time, so no separate coverage cell.
drop(db);
let reopened = backend::reopen(uri).await;
for (name, res) in run_battery(&reopened, &model).await {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse panic classification after reopening

When a walk hits a known durable Lance panic, the in-loop battery is wrapped in catch_unwind and allow-listed, but the post-reopen durability pass calls run_battery directly. If the reopened bytes trigger the same known panic again (for example via index_probe), the DST aborts instead of recording the known finding; wrap this call in the same catch_unwind/classify_panic path used above.

Useful? React with 👍 / 👎.

}));
}
for h in handles {
let _ = h.await; // ignore JoinError: a contained actor panic is judged by the battery

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not discard actor panics

If any concurrent actor panics, h.await returns a JoinError, but this line drops it unconditionally. A novel engine panic that leaves the graph structurally valid would make the test pass after the post-join battery, masking exactly the kind of concurrency failure this harness is meant to surface; inspect the join error and classify/re-raise its panic payload instead of ignoring it.

Useful? React with 👍 / 👎.

@ragnorc

ragnorc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #303, which consolidates the full DST harness onto current main (0.7.2) in one PR and adds the completeness-critic hardening (coverage ledger, the #296 concurrent-recovery cell, and the fault-seam closure). No content lost — #303 contains everything here, rebased onto 0.7.2.

@ragnorc ragnorc closed this Jun 25, 2026
@ragnorc ragnorc deleted the dst-harness-pr-b branch June 25, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant