docs: state cluster apply is storage-direct, not server-routed#306
Merged
Conversation
`cluster apply` reaches the object store directly — the `__cluster/` ledger and each graph's Lance datasets — never through a running omnigraph-server, so the host that runs it needs storage credentials. The rationale (declarative control plane, not a runtime mutation API) was documented in cluster-axioms.md §3/§4, and the out-of-band/direct-storage fact was stated for the maintenance verbs and init/load, but never spelled out for apply itself. - docs/user/clusters/index.md: add a day-2 note making apply's storage-direct execution and credential requirement explicit, linking the why to axioms 3/4. - skills/omnigraph/SKILL.md: extend the "init/load write storage directly (bypassing the server)" line to include cluster apply, with the same reasoning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The trailing (§5) sat right after the cluster-axioms.md §3/§4 citation, so a reader could read §5 as referring to cluster-axioms.md (whose §5 covers locked state) rather than this guide's §5. Make it an explicit same-page forward reference. Addresses Greptile P2 on #306. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "server only reads from it" wording was wrong: the data plane serves HTTP writes (mutate/load/branch) that go through the server to the graph datasets, so omnigraph-server is not read-only against object storage. The hazard is an operator granting the server read-only S3 creds and breaking runtime writes. Scope the read-only claim to cluster (control-plane) state at boot, and state that data-plane writes still need read-write storage access. Addresses Greptile P-level finding on #306. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| ``` | ||
|
|
||
| `init` and `load` write storage directly (bypassing the server); the server reads from it. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`. | ||
| `init`, `load`, and **`cluster apply`** write storage directly (bypassing the server). `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`. |
There was a problem hiding this comment.
This sentence still says load writes storage directly and bypasses the server, but server-targeted loads use the server's HTTP write path. When an agent follows this skill for omnigraph load --server ..., it can provision or validate only the CLI host for storage writes even though the server performs the Lance write and needs read-write storage access. The later sentence about HTTP load going through the server helps, but the opening sentence still gives the opposite rule for the same command.
Suggested change
| `init`, `load`, and **`cluster apply`** write storage directly (bypassing the server). `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`. | |
| `init`, direct-storage `load`, and **`cluster apply`** write storage directly (bypassing the server). When `load` is run through a configured server, it uses the server's data-plane HTTP path instead. `cluster apply` is a storage-direct control-plane command — it reaches the object store directly (the `__cluster/` ledger *and* each graph's Lance datasets, to create/migrate/delete them), never through a running server, so the host that runs it needs storage access (the `AWS_*` contract for an `s3://` cluster). That is by design: the control plane is declarative (config → cluster), not a runtime mutation API on the serving process. The server reads **cluster** state read-only at boot, but it is not read-only against storage overall — data-plane HTTP writes (`mutate`/`load`/`branch`) still go through the server to the graph datasets, so it needs read-write storage access. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`. |
Context Used: AGENTS.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes explicit — in both the operator guide and the agent skill — that
omnigraph cluster applyis a storage-direct control-plane command: it reaches the object store directly (the__cluster/ledger and each graph's Lance datasets, to create/migrate/delete them), never through a runningomnigraph-server. The host that runs it (operator or CI) therefore needs storage access — theAWS_*credential contract for ans3://cluster.Why
This is a real doc gap, not a behavior change. The reasoning was already documented in
docs/dev/cluster-axioms.md§3 (no runtime mutation API on the running system) and §4 (config lives outside the running system). The out-of-band / direct-storage fact was stated explicitly for the maintenance verbs (optimize/repair/cleanup, §7) and forinit/load— but never spelled out forapplyitself. A reader could reasonably assumeapplyroutes through the server. It does not, and the credential implication matters operationally.Changes
docs/user/clusters/index.md(§2, the day-2 loop) — adds a note thatapplyruns out-of-band with direct storage access, never via the server, and the host needs storage credentials; links the why to cluster-axioms §3/§4, and notes the server only ever reads the converged ledger (so a held apply lock never blocks serving).skills/omnigraph/SKILL.md(Storage & Credentials) — extends the existing "initandloadwrite storage directly (bypassing the server)" line to includecluster apply, with the declarative-not-runtime-API rationale.Docs-only; no code or behavior change.
🤖 Generated with Claude Code
Greptile Summary
This PR clarifies how
omnigraph cluster applyreaches storage. The main changes are:Confidence Score: 4/5
This is close, but the skill wording should be fixed before merging.
loadbypasses the server, while server-targeted loads use the HTTP write path.skills/omnigraph/SKILL.md
Important Files Changed
Reviews (4): Last reviewed commit: "Merge branch 'main' into docs/cluster-ap..." | Re-trigger Greptile
Context used: