Skip to content

feat(engine): retire commit-graph tables#311

Merged
ragnorc merged 8 commits into
mainfrom
ragnorc/retire-commit-graph-tables
Jun 28, 2026
Merged

feat(engine): retire commit-graph tables#311
ragnorc merged 8 commits into
mainfrom
ragnorc/retire-commit-graph-tables

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Retires the legacy commit-graph Lance datasets and makes __manifest the single source of truth for graph lineage, branch heads, and internal storage-version gating.

This PR intentionally changes the upgrade model: this binary serves exactly the current internal manifest schema (v4). Graphs written by older releases are refused on open with an explicit export/import rebuild path instead of being migrated in place.

What Changed

Commit Graph Retirement

  • Deletes the vestigial _graph_commits.lance and _graph_commit_actors.lance storage paths.
  • Makes CommitGraph a pure in-memory projection of __manifest graph_commit / graph_head rows.
  • Routes branch create/delete authority through __manifest only.
  • Removes commit-graph table opens, optimization, cleanup, instrumentation counters, old readers, and migration fixtures.

Strict Internal-Schema Versioning

  • Sets MIN_SUPPORTED == CURRENT == 4.
  • Stamps newly initialized graphs with the current internal schema version.
  • Refuses both older and newer manifest stamps at every open/publish boundary.
  • Replaces the previous in-place migration dispatcher with a small stamp/refuse gate and actionable error text.

Operator Visibility

  • Surfaces internal_schema_version in snapshot output and API response types.
  • Adds internal-schema visibility to the CLI/server surfaces covered by this branch.
  • Updates OpenAPI fixtures and related API tests.

Docs

  • Adds the export/import upgrade guide.
  • Documents strict single-version storage compatibility.
  • Updates developer invariants, Lance notes, release notes, constants, maintenance docs, and write-path docs to match the retired commit-graph tables.

Breaking Change

Graphs created by older releases with a sub-v4 internal schema are no longer opened by this binary.

Upgrade path:

  1. Run omnigraph export with the older release that can still read the graph.
  2. Create a new graph with this release using omnigraph init.
  3. Import the exported JSONL with omnigraph load.

Data, vectors, and blobs are preserved by the export/import rebuild. Commit history and branches are not preserved; export is snapshot-based, so rebuild each branch snapshot intentionally if needed.

Compatibility Test Coverage

The CLI system test now exercises the full release-to-current lifecycle using published GitHub release binaries:

  • v0.7.2
  • v0.7.1
  • v0.7.0
  • v0.6.2

For each release, the test:

  1. Downloads the platform release tarball and verifies its .sha256.
  2. Uses that legacy binary to initialize and load a graph.
  3. Creates a branch and writes branch-only data.
  4. Verifies the current v4 binary refuses the legacy graph with the export/import nudge.
  5. Exports the branch snapshot with the legacy binary.
  6. Recreates the graph with the current binary.
  7. Loads the export and verifies snapshot row counts plus query behavior.

Validation

  • cargo test -p omnigraph-cli --test system_local local_cli_legacy_release_graphs_refuse_then_export_import_into_v4 -- --nocapture
    • Passed: 1 passed; 0 failed; 28 filtered out
  • git diff --check -- crates/omnigraph-cli/tests/system_local.rs
    • Passed

Earlier branch validation also covered the focused manifest refusal tests, lineage projection tests, CLI snapshot/version tests, and docs link checks.

ragnorc added 3 commits June 27, 2026 18:09
Records the validated 6-LIST warm-write cost model, the two root causes
(un-GC'd _versions/; re-resolving latest by listing), and the layered fix
(GC + capture-once reuse), plus how commit-graph-table retirement feeds in.
Linked from docs/dev/index.md next to the RFC-013 docs.
…n, no in-place migration

Set MIN_SUPPORTED == CURRENT == 4: this binary reads exactly one `__manifest`
internal-schema version and refuses any older graph on open with a
rebuild-via-export/import message, instead of migrating it in place. Storage
format changes become a deliberate cutover, not a permanently-carried in-place
migration — the pre-release "complexity must be earned" contract.

Delete the entire in-place migration apparatus and everything that existed only
to support it: the `migrate_vN` arms + dispatcher + stamp-bump helpers + the
schema-version-floor tripwire; `migrate_on_open` (both open modes now refuse);
the legacy `_graph_commits.lance` readers + the v3 test fixtures + migration
tests + `migration.v3_to_v4.*` failpoints + the two surface guards that pinned
Lance variants only the migration matched on; and `state::merge_lineage_rows`.
Keep `read_stamp` / `stamp_current_version` / `set_stamp` /
`refuse_if_stamp_unsupported` — the seam a future one-shot converter plugs into.

`load_commit_cache_for_branch` now reads the `__manifest` projection
unconditionally (sub-v4 graphs are refused at open). Adds
`sub_current_graph_is_refused_on_open_with_rebuild_hint`.

The commit-graph TABLES are still created/used as branch-ref ledgers — their
retirement (CommitGraph -> pure `__manifest` projection) is the next commit.

BREAKING CHANGE: a graph created by omnigraph <= 0.7.2 (internal schema v3) is
refused on open. Rebuild it: `omnigraph export` with the old release, then
`omnigraph init` + `omnigraph load` with this one. Data, vectors, and blobs are
preserved; commit history and branches are not.
…lance` — CommitGraph is a pure `__manifest` projection

Since RFC-013 Phase 7, graph lineage lives in `__manifest` (`graph_commit` /
`graph_head` rows) and branch authority is `__manifest` (branch create forks it
first). The two commit-graph datasets were vestigial: `_graph_commit_actors.lance`
was never written or read; `_graph_commits.lance` carried zero commit rows and
only mirrored the manifest's branch refs (a deny-list "parallel copy"). Retire
both.

- `CommitGraph` collapses to a pure projection: drops its Lance dataset handles
  (`dataset`/`actor_dataset`) and all branch methods; `open`/`open_at_branch`/
  `refresh`/`init` open NO dataset, building the cache from
  `ManifestCoordinator::read_graph_lineage_at`. Removes ~1.4s of cold-open
  dataset opens.
- `graph_coordinator`: `commit_graph` is now non-`Option` (always a valid
  projection). `branch_create`/`branch_delete` go through `ManifestCoordinator`
  only — a single atomic op, replacing the two-step manifest-fork +
  commit-graph-fork + rollback. Deleted `create_commit_graph_branch`,
  `reclaim_commit_graph_branch`, `ensure_commit_graph_initialized`, and every
  `storage.exists(_graph_commits.lance)` gate.
- `optimize`: dropped `reconcile_commit_graph_orphans` and the two tables from
  the internal-table compaction set (now `__manifest` only).
- `instrumentation`: `INTERNAL_TABLE_DIRS` no longer lists the two tables.
- Fresh graphs create neither table; `lineage_projection.rs` now asserts both
  `.lance` dirs are absent. Deleted the obsolete commit-graph-branch-race
  failpoint tests + their failpoint names, and updated the `maintenance`
  optimize tests (one internal table, not three).

Review-pass fixes folded in:
- Removed two stale `omnigraph.rs` in-source tests the prior run missed (a
  disk-full link failure masked them): one asserting `open` probes
  `_graph_commits.lance` (the exists-gate this commit removes) — it was masked
  earlier by a disk-full link failure.
- Corrected src comments referencing deleted code (`migrate_v3_to_v4`,
  `append_commit`/`append_merge_commit`, the three-internal-table list,
  the `_graph_commits` reconcile owner) in publisher/recovery/optimize/recovery_audit.
- Narrowed `set_stamp_for_test` to `cfg(test)` (its only caller is the refusal
  test) — removes a dead-code warning in the failpoints build.

Branch create/delete atomicity improves (single atomic `__manifest` op). No
behavior change for reads or branches.

Follow-up (separate commit): the now-always-0 `IoCounts::commit_graph_reads` test
counter + its `IOTracker`, threaded through ~11 cost-test files.
@ragnorc ragnorc changed the title Retire commit-graph tables + strand storage versioning (no in-place migration) retire commit-graph tables Jun 28, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 46285c0. Configure here.

}

pub async fn refresh(&mut self) -> Result<()> {
let root = self.root_uri.clone();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing branch internal schema check

Medium Severity

load_commit_cache_for_branch loads lineage from __manifest without reading the branch’s internal-schema stamp or calling refuse_if_stamp_unsupported. Open still validates only main’s stamp, so a v4 main graph with a named branch left below v4 (or above CURRENT) can serve empty or incorrect commit history instead of the export/import or upgrade refusal the publisher already applies on writes.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 46285c0. Configure here.

Comment thread crates/omnigraph/src/db/manifest/migrations.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46285c01a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// only the derived `_graph_commits.lance` ref is allowed to be missing.
// `load_commit_cache_for_branch` opens the branch's `__manifest` (the
// authoritative table), so a truly absent branch fails loudly here.
let (commit_by_id, head_commit) = load_commit_cache_for_branch(root, Some(branch)).await?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refuse unsupported stamps on branch opens

When a non-main branch's __manifest has a different internal-schema stamp than main (for example, main was already moved to v4 but a branch is still v3, or a newer binary has written only that branch), this path opens and decodes the branch without calling refuse_if_stamp_unsupported. That bypasses the new single-version contract and can make branch reads/merges silently compute lineage from an unsupported manifest shape instead of failing with the rebuild/upgrade error; the branch open path should validate the stamp for the branch being opened before load_commit_cache_for_branch scans it.

Useful? React with 👍 / 👎.

ragnorc added 4 commits June 28, 2026 14:52
After stranding storage versioning (a sub-v4 graph is refused on open), operators
could only discover the storage-format version by hitting a refusal. Surface it:

- `omnigraph version` prints an `internal-schema <N>` line (the binary's CURRENT
  storage-format version).
- `omnigraph snapshot` includes `internal_schema_version` — the GRAPH's per-branch
  on-disk stamp, read via the new `Omnigraph::internal_schema_version_of`.
- `GET /healthz` includes `internal_schema_version` (server-scoped: the binary's
  CURRENT, alongside `version`/`source_version`).

Wire: re-expose `INTERNAL_MANIFEST_SCHEMA_VERSION` as `pub` on `db::manifest`;
add `internal_schema_version: u32` to `SnapshotOutput` + `HealthOutput`;
`snapshot_payload` takes the per-graph version (the `Snapshot` does not carry it),
threaded through the embedded CLI + server snapshot callers. `openapi.json`
regenerated (two added int32 properties). Extends the existing healthz / snapshot /
version tests.
… the per-branch read gap

PR reviewers flagged that the open path validates only main's internal-schema stamp, so a branch read could decode a branch stamped outside this binary's range. The stamp is a graph-wide storage-format property (the upgrade path is a whole-graph export/import), so with one binary version every branch is always CURRENT; divergence needs concurrent multi-version writers, an unsupported topology already in one-winner-CAS territory. Gating per-branch would add a second __manifest open per non-main branch read to defend a state we do not support, unearned complexity that regresses the warm-read budget.

Keep the graph-level gate, document it at the code site (refuse_if_internal_schema_unsupported), and record the read-only residual hole as a known gap in invariants.md to close only when multi-version write topologies become supported. Also clarify the sub-floor rebuild message to say "export with the older omnigraph binary that created it."

No behavior change: HEAD already gated at the graph level.
Phase B retired _graph_commits.lance / _graph_commit_actors.lance, so no commit-graph dataset is opened and the commit_graph IOTracker term is structurally always 0. Remove IoCounts::commit_graph_reads, its total_reads() term, the commit_graph IOTracker in OpProbes, and the now-dead commit_graph_wrapper field on QueryIoProbes (it had no accessor — nothing ever attached it). Drop the 7 trivially-true assert_eq!(commit_graph_reads, 0) checks in warm_read_cost.rs and the debug-print refs in write_cost{,_s3}.rs.

Lineage and actor rows now live in __manifest (RFC-013 Phase 7), so the internal_table_scans_are_flat_in_history gate folds into the single manifest_reads flat-assertion — the manifest scan already covers them. Harness-only; no production runtime impact.
Update the always-loaded and user-facing docs to match the landed state: graph lineage lives in __manifest, the _graph_commits.lance / _graph_commit_actors.lance tables are retired, and storage is strict-single-version (no in-place migration — a sub-CURRENT graph is refused with an export/import rebuild).

Fixed stale claims in invariants.md (the migration/atomicity known-gap entry, the Truth Matrix branch-delete row, the read-path/optimize internal-table scope), lance.md (the migrate_v1_to_v2 PK bullet now reflects init-time set; removed the two deleted v3->v4 migration surface guards), testing.md (dropped the deleted migration failpoint tests; manifest-only internal-table term), writes.md (rewrote the Migration-code section to the strand model), storage.md / maintenance.md / constants.md (retired tables out of the layout, internal-table compaction scope, and the constants cheat-sheet), and AGENTS.md. Marked the retirement DONE in the RFC-013 handoff/roadmap and banner-noted the historical RFC analysis.

Added docs/user/operations/upgrade.md (the export/import rebuild recipe) and docs/dev/versioning.md (the four-axis compatibility policy: release lockstep / wire additive / storage strict-single-version / Lance pinned), cross-linked from the audience indexes and the AGENTS.md topic map, and rewrote the in-progress v0.8.0 release note for the strand model + version surfacing. check-agents-md.sh passes (65 links, 62 docs).
@ragnorc

ragnorc commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Resolution of the branch-stamp review comments (cursor[bot] / chatgpt-codex / greptile)

The reviewers flagged that the open path validates only main's internal-schema stamp, so a branch read could decode a branch stamped outside this binary's range. After weighing it, the resolution is to gate the stamp at the graph (main) level, not per-branch, and record the residual as a documented known gap rather than add per-branch machinery.

Why graph-level is the lower-liability choice:

  • The internal-schema stamp describes the graph's storage format — a graph-wide property. It lives per-branch only as an artifact of sitting in __manifest's schema metadata (a branched Lance dataset). The upgrade contract is graph-wide too: the export/import rebuild is whole-graph, and you can't merge across versions.
  • With one binary version, every branch is always CURRENT (init stamps main, create_branch forks the stamp, the publisher writes rows without re-stamping). A branch stamped out of range while main stays in range is only reachable with concurrent multi-version writers — a topology the strand model already declares unsupported ("this binary reads exactly ONE internal-schema version"; multi-process writers are documented one-winner-CAS territory).
  • A per-branch read gate would add a second __manifest open on every non-main branch read (regressing the RFC-013 read-latency work and tripping the warm_read_cost.rs budgets) to defend a state we don't support — unearned complexity.
  • The dangerous direction (reading a future format) stays partly covered without it: the publisher already refuses writes to a future-stamped branch (it reads its target's stamp), and a graph whose main was advanced by a newer binary is refused at open. The residual is a read-only hole in an unsupported topology.

What changed: refuse_if_internal_schema_unsupported stays single-arg (main), the per-branch refuse and its test were reverted, the greptile message-wording fix was kept, and the read-only residual is now a documented known gap in docs/dev/invariants.md (to close only when multi-version write topologies become supported — folded into the branch-manifest open the read already does, at zero extra I/O). See commit ddd194a3.

This branch also now retires the dead commit_graph_reads cost counter (24d8eaf2) and aligns the always-loaded + user-facing docs with the retirement and the strand storage-versioning model, including a new docs/user/operations/upgrade.md (export/import recipe) and docs/dev/versioning.md (the four-axis compatibility policy) (05eafc6e).

…branch stamp inheritance

Two coverage additions from PR review (P1):

(a) sub_current_graph_is_refused_then_rebuilt_via_export_import — the full operator narrative in one flow: load → export → a sub-CURRENT graph (stamp rewound below CURRENT) is refused with the export nudge → fresh init + load(export) → data present and the rebuilt graph opens. The refusal is stamp-only (read before any data), so a stamp-rewound graph is a faithful stand-in for a real older-release graph without a second binary; vector/blob fidelity stays covered by tests/export.rs.

(b) branch_inherits_main_internal_schema_stamp — proves a branch cannot diverge from main's stamp under single-binary operation (create_branch forks main's __manifest, the publisher does not re-stamp), which is why the graph-level (main-only) stamp gate is sufficient for supported inputs. A divergent branch stamp needs concurrent multi-version writers, the unsupported topology recorded as a known gap.
@ragnorc ragnorc changed the title retire commit-graph tables feat(engine): retire commit-graph tables Jun 28, 2026
@ragnorc ragnorc merged commit 7779b72 into main Jun 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant