Skip to content

fix(sdk): verify quorum signature on broadcast wait-path before trusting metadata#3872

Open
Claudius-Maginificent wants to merge 6 commits into
v3.1-devfrom
fix/sdk-broadcast-signature-verification
Open

fix(sdk): verify quorum signature on broadcast wait-path before trusting metadata#3872
Claudius-Maginificent wants to merge 6 commits into
v3.1-devfrom
fix/sdk-broadcast-signature-verification

Conversation

@Claudius-Maginificent

@Claudius-Maginificent Claudius-Maginificent commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Issue being fixed or feature implemented

The rs-sdk broadcast write-path (wait_for_response in platform/transition/broadcast.rs) verified state-transition execution proofs with Drive::verify_state_transition_was_executed_with_proof, which performs only the GroveDB structural Merkle check and discards the returned root hash. It never ran verify_tenderdash_proof — the quorum BLS signature gate. So the response ResponseMetadata (including protocol_version) was not consensus-authenticated on the write path, unlike the query path (which calls verify_tenderdash_proof in every FromProof verifier).

Why this matters now: this gap surfaced during the security review of #3809, which added a protocol-version auto-detect ratchet to the query path. Extending that ratchet to the write path is unsafe while the broadcast path doesn't verify the signature — a malicious or MITM DAPI node can return a structurally-valid GroveDB proof paired with forged metadata, and a ratchet consuming it would push the SDK's local encoder to a wire shape the real network rejects (a client-local self-DoS, bounded by the upward-only/known-versions-only ratchet guards). The broadcast ratchet was therefore deferred from #3809. This PR closes the underlying signature gap and then lands the ratchet safely.

What breaks without it: a fresh auto-detect SDK whose first traffic is a write (withdraw_from_identity, top_up_identity, transfer_*, address_credit_withdrawal, …) verifies the execution proof against an unauthenticated metadata blob; any version/state logic keyed off that metadata is attacker-influenceable.

Blocking relationship: stacked on the protocol-version work merged in #3809.

What was done?

  • Verify the quorum signature on the broadcast wait-path. Route result verification through the existing StateTransitionProofResult: FromProof<BroadcastStateTransitionRequest> verifier, whose maybe_from_proof_with_metadata runs the same GroveDB structural check and verify_tenderdash_proof (quorum BLS gate). The request is reconstructed from the local StateTransition via broadcast_request_for_state_transition(). (The Sdk::parse_proof_with_metadata_and_proof helper does not fit — it is bounded O: MockResponse, which StateTransitionProofResult does not implement — so the FromProof impl is invoked directly.)
  • Ratchet the protocol version safely. verify_response_metadata (the auto-detect ratchet) now runs strictly after the signature verification succeeds, so it consumes only authenticated metadata (verify-before-ratchet).
  • Docs/comments corrected. Guard comments no longer imply a BLS binding the code didn't perform; the parse_proof_with_metadata_and_proof bootstrap rustdoc now describes the correct floor-vs-newer-network first-request risk (the SDK seeds at the version floor since fix(sdk): default initial protocol version to 10 when unpinned (upgrade-safe ratchet floor) #3809).
  • Coverage note. A dedicated offline test for this path is deferred to a follow-up (see Testing): the SDK test framework has no write path, so a broadcast-wait vector must be sourced from a devnet-injected state transition.

How Has This Been Tested?

  • cargo test -p dash-sdk --lib139 passed, 0 failed.
  • cargo test -p dash-sdk --test main131 passed, 0 failed, 4 ignored (pre-existing baseline; no new ignored tests).
  • cargo fmt -p dash-sdk and cargo clippy -p dash-sdk --all-targets → clean.
  • Independent security review: confirmed verify_tenderdash_proof runs before the ratchet on every path (no bypass via the None/early-return branches), the ratcheted metadata is the verifier's authenticated return value, and the reconstructed request binds the result to the client's own state transition.
  • Independent QA review: behavior-preservation audit (result conversion, state_transition_broadcast_error early-return, address/retries propagation unchanged); see Breaking Changes for the one intentional behavior delta.
  • Dedicated test coverage is a tracked follow-up. This path can't be unit-tested without a quorum-signed WaitForStateTransitionResult vector, and the SDK test framework has no write path to broadcast one. The vector must come from a deterministic state transition executed in a committed (block-1) block via rs-drive-abci's create_sdk_test_data devnet injection, then captured for offline replay — a separate, engine-focused PR. Meanwhile the fix's correctness rests on the structurally-guaranteed verify-before-ratchet ordering (the ratchet sits strictly after the verifier's ?) plus the existing test_ratchet_rejects_unknown_and_non_upward_versions guard — both confirmed by the security review.

Breaking Changes

No public API breaks. Two behavior notes:

  • Retry classification (intentional): a proof/signature verification failure on the broadcast wait-path is now classified retryable (Error::Proof) — matching the query path's established convention — so a bad/forged proof from one node prompts a retry against another server instead of a hard fail (bounded by max_retries). Previously these failures fast-failed as the non-retryable Error::DriveProofError.
  • Mock-mode broadcast: the path now always runs signature verification, so any consumer mocking a broadcast must supply a context provider with valid quorum keys. (No in-repo mock broadcast support exists today.)

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests (dedicated broadcast-path test deferred to a devnet-injection follow-up — see Testing)
  • I have added "!" to the title and described breaking changes (no API break; behavior deltas documented above)
  • I have made corresponding changes to the documentation if needed

🤖 Co-authored by Claudius the Magnificent AI Agent

lklimek and others added 4 commits June 12, 2026 17:27
…ing metadata

The broadcast `wait_for_response` closure verified state-transition execution
with `Drive::verify_state_transition_was_executed_with_proof`, which performs
ONLY the GroveDB structural Merkle check and discards the returned root hash.
It never ran `verify_tenderdash_proof`, so the response `metadata` (including
`protocol_version`) was not consensus-authenticated. A malicious or MITM DAPI
node could return a structurally-valid proof with forged metadata.

Route this path through the existing
`FromProof<BroadcastStateTransitionRequest>` impl, whose
`maybe_from_proof_with_metadata` runs the same structural check AND the quorum
BLS signature gate (`verify_tenderdash_proof`). The protocol-version ratchet
(`verify_response_metadata`) now runs only after that verification succeeds,
so it consumes authenticated metadata. Mock mode, the
`StateTransitionProofResult` conversion, the `state_transition_broadcast_error`
early-return, and the execution-result/retry semantics are unchanged.

The SDK `parse_proof_with_metadata_and_proof` helper does not fit here: it is
bounded `O: MockResponse` (and its mock branch needs `Option<O>: MockResponse`),
which `StateTransitionProofResult` does not implement. Calling the `FromProof`
impl directly verifies the signature without that bound, and the wait-path
already obtains its response through the mock-aware DAPI transport.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…proof_with_metadata_and_proof

The `## Protocol version bootstrapping` rustdoc described the pre-floor-seeding
risk (older network). The SDK now seeds at the floor
(`DEFAULT_INITIAL_PROTOCOL_VERSION`), so the real first-request risk is the
newer-network direction: a network newer than the floor whose proof
interpretation differs from the floor may fail before the ratchet lifts the SDK.
Reworded accordingly; the `with_version()` pinning guidance is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oof change

The `maybe_result.ok_or_else(...)` branch on the broadcast wait-path is currently
unreachable: `FromProof<BroadcastStateTransitionRequest>::maybe_from_proof_with_metadata`
always returns `Ok((Some(result), ..))`. Keep the defensive typed error (stay
panic-free) and add a one-line comment stating it guards only a future impl change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…verification

Drive the broadcast wait-path's security-critical verifier directly:
`<StateTransitionProofResult as FromProof<BroadcastStateTransitionRequest>>::maybe_from_proof_with_metadata`,
which runs the GroveDB structural check AND `verify_tenderdash_proof` (the quorum
BLS signature gate). The test asserts a valid quorum-signed proof verifies and
exposes authenticated metadata, and that three forgeries are rejected: a tampered
proof signature, a wrong quorum public key, and a mutated `protocol_version`
(which alters the signed `StateId.app_version`). A fifth case proves
verify-before-ratchet: a valid proof's authenticated metadata lifts a fresh mock
SDK via `verify_response_metadata`.

The five tests are `#[ignore]`d because they need a captured quorum-signed
broadcast proof vector that is not yet committed; they compile against the real
APIs so they cannot bitrot. The module documents exactly which vector files to
capture (response, state transition, quorum public key) from a v12 devnet and to
un-`#[ignore]` once committed. The test is not network-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@lklimek, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 15 minutes and 45 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f05f6cb-094c-4996-badf-95233b770028

📥 Commits

Reviewing files that changed from the base of the PR and between e2039e5 and 9503b20.

📒 Files selected for processing (2)
  • packages/rs-sdk/src/platform/transition/broadcast.rs
  • packages/rs-sdk/src/sdk.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sdk-broadcast-signature-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

lklimek and others added 2 commits June 12, 2026 23:25
…d to devnet-injection follow-up)

Remove the bespoke `tests/fetch/broadcast_wait_signature.rs` (the
`FixedQuorumKeyProvider` + hand-authored vector loaders) and its `mod` line. The
test relied on captured vectors that the current SDK test harness cannot generate
(no funded identity or signer for a broadcast), so it never ran. The
signature-verification coverage is deferred to a separate devnet-injection
follow-up that can produce a real signed proof vector.

The production fix (broadcast wait-path quorum-signature verification) and its
docs are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lklimek lklimek marked this pull request as ready for review June 15, 2026 09:57
@thepastaclaw

thepastaclaw commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

✅ Review complete (commit 9503b20)

@thepastaclaw thepastaclaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verified PR #3872: the broadcast wait-path now routes success-result handling through <StateTransitionProofResult as FromProof<BroadcastStateTransitionRequest>>::maybe_from_proof_with_metadata, which runs the GroveDB structural check and verify_tenderdash_proof before verify_response_metadata consumes metadata.protocol_version for the ratchet. The verify-before-ratchet ordering at broadcast.rs:168-198 matches both agents' analyses; no in-scope blocking issues or actionable suggestions remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants