fix(sdk): verify quorum signature on broadcast wait-path before trusting metadata#3872
fix(sdk): verify quorum signature on broadcast wait-path before trusting metadata#3872Claudius-Maginificent wants to merge 6 commits into
Conversation
…ing metadata The broadcast `wait_for_response` closure verified state-transition execution with `Drive::verify_state_transition_was_executed_with_proof`, which performs ONLY the GroveDB structural Merkle check and discards the returned root hash. It never ran `verify_tenderdash_proof`, so the response `metadata` (including `protocol_version`) was not consensus-authenticated. A malicious or MITM DAPI node could return a structurally-valid proof with forged metadata. Route this path through the existing `FromProof<BroadcastStateTransitionRequest>` impl, whose `maybe_from_proof_with_metadata` runs the same structural check AND the quorum BLS signature gate (`verify_tenderdash_proof`). The protocol-version ratchet (`verify_response_metadata`) now runs only after that verification succeeds, so it consumes authenticated metadata. Mock mode, the `StateTransitionProofResult` conversion, the `state_transition_broadcast_error` early-return, and the execution-result/retry semantics are unchanged. The SDK `parse_proof_with_metadata_and_proof` helper does not fit here: it is bounded `O: MockResponse` (and its mock branch needs `Option<O>: MockResponse`), which `StateTransitionProofResult` does not implement. Calling the `FromProof` impl directly verifies the signature without that bound, and the wait-path already obtains its response through the mock-aware DAPI transport. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…proof_with_metadata_and_proof The `## Protocol version bootstrapping` rustdoc described the pre-floor-seeding risk (older network). The SDK now seeds at the floor (`DEFAULT_INITIAL_PROTOCOL_VERSION`), so the real first-request risk is the newer-network direction: a network newer than the floor whose proof interpretation differs from the floor may fail before the ratchet lifts the SDK. Reworded accordingly; the `with_version()` pinning guidance is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oof change The `maybe_result.ok_or_else(...)` branch on the broadcast wait-path is currently unreachable: `FromProof<BroadcastStateTransitionRequest>::maybe_from_proof_with_metadata` always returns `Ok((Some(result), ..))`. Keep the defensive typed error (stay panic-free) and add a one-line comment stating it guards only a future impl change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…verification Drive the broadcast wait-path's security-critical verifier directly: `<StateTransitionProofResult as FromProof<BroadcastStateTransitionRequest>>::maybe_from_proof_with_metadata`, which runs the GroveDB structural check AND `verify_tenderdash_proof` (the quorum BLS signature gate). The test asserts a valid quorum-signed proof verifies and exposes authenticated metadata, and that three forgeries are rejected: a tampered proof signature, a wrong quorum public key, and a mutated `protocol_version` (which alters the signed `StateId.app_version`). A fifth case proves verify-before-ratchet: a valid proof's authenticated metadata lifts a fresh mock SDK via `verify_response_metadata`. The five tests are `#[ignore]`d because they need a captured quorum-signed broadcast proof vector that is not yet committed; they compile against the real APIs so they cannot bitrot. The module documents exactly which vector files to capture (response, state transition, quorum public key) from a v12 devnet and to un-`#[ignore]` once committed. The test is not network-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 15 minutes and 45 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…d to devnet-injection follow-up) Remove the bespoke `tests/fetch/broadcast_wait_signature.rs` (the `FixedQuorumKeyProvider` + hand-authored vector loaders) and its `mod` line. The test relied on captured vectors that the current SDK test harness cannot generate (no funded identity or signer for a broadcast), so it never ran. The signature-verification coverage is deferred to a separate devnet-injection follow-up that can produce a real signed proof vector. The production fix (broadcast wait-path quorum-signature verification) and its docs are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
✅ Review complete (commit 9503b20) |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Verified PR #3872: the broadcast wait-path now routes success-result handling through <StateTransitionProofResult as FromProof<BroadcastStateTransitionRequest>>::maybe_from_proof_with_metadata, which runs the GroveDB structural check and verify_tenderdash_proof before verify_response_metadata consumes metadata.protocol_version for the ratchet. The verify-before-ratchet ordering at broadcast.rs:168-198 matches both agents' analyses; no in-scope blocking issues or actionable suggestions remain.
Issue being fixed or feature implemented
The rs-sdk broadcast write-path (
wait_for_responseinplatform/transition/broadcast.rs) verified state-transition execution proofs withDrive::verify_state_transition_was_executed_with_proof, which performs only the GroveDB structural Merkle check and discards the returned root hash. It never ranverify_tenderdash_proof— the quorum BLS signature gate. So the responseResponseMetadata(includingprotocol_version) was not consensus-authenticated on the write path, unlike the query path (which callsverify_tenderdash_proofin everyFromProofverifier).Why this matters now: this gap surfaced during the security review of #3809, which added a protocol-version auto-detect ratchet to the query path. Extending that ratchet to the write path is unsafe while the broadcast path doesn't verify the signature — a malicious or MITM DAPI node can return a structurally-valid GroveDB proof paired with forged metadata, and a ratchet consuming it would push the SDK's local encoder to a wire shape the real network rejects (a client-local self-DoS, bounded by the upward-only/known-versions-only ratchet guards). The broadcast ratchet was therefore deferred from #3809. This PR closes the underlying signature gap and then lands the ratchet safely.
What breaks without it: a fresh auto-detect SDK whose first traffic is a write (
withdraw_from_identity,top_up_identity,transfer_*,address_credit_withdrawal, …) verifies the execution proof against an unauthenticated metadata blob; any version/state logic keyed off that metadata is attacker-influenceable.Blocking relationship: stacked on the protocol-version work merged in #3809.
What was done?
StateTransitionProofResult: FromProof<BroadcastStateTransitionRequest>verifier, whosemaybe_from_proof_with_metadataruns the same GroveDB structural check andverify_tenderdash_proof(quorum BLS gate). The request is reconstructed from the localStateTransitionviabroadcast_request_for_state_transition(). (TheSdk::parse_proof_with_metadata_and_proofhelper does not fit — it is boundedO: MockResponse, whichStateTransitionProofResultdoes not implement — so theFromProofimpl is invoked directly.)verify_response_metadata(the auto-detect ratchet) now runs strictly after the signature verification succeeds, so it consumes only authenticated metadata (verify-before-ratchet).parse_proof_with_metadata_and_proofbootstrap rustdoc now describes the correct floor-vs-newer-network first-request risk (the SDK seeds at the version floor since fix(sdk): default initial protocol version to 10 when unpinned (upgrade-safe ratchet floor) #3809).How Has This Been Tested?
cargo test -p dash-sdk --lib→ 139 passed, 0 failed.cargo test -p dash-sdk --test main→ 131 passed, 0 failed, 4 ignored (pre-existing baseline; no new ignored tests).cargo fmt -p dash-sdkandcargo clippy -p dash-sdk --all-targets→ clean.verify_tenderdash_proofruns before the ratchet on every path (no bypass via theNone/early-return branches), the ratcheted metadata is the verifier's authenticated return value, and the reconstructed request binds the result to the client's own state transition.state_transition_broadcast_errorearly-return, address/retries propagation unchanged); see Breaking Changes for the one intentional behavior delta.WaitForStateTransitionResultvector, and the SDK test framework has no write path to broadcast one. The vector must come from a deterministic state transition executed in a committed (block-1) block via rs-drive-abci'screate_sdk_test_datadevnet injection, then captured for offline replay — a separate, engine-focused PR. Meanwhile the fix's correctness rests on the structurally-guaranteed verify-before-ratchet ordering (the ratchet sits strictly after the verifier's?) plus the existingtest_ratchet_rejects_unknown_and_non_upward_versionsguard — both confirmed by the security review.Breaking Changes
No public API breaks. Two behavior notes:
Error::Proof) — matching the query path's established convention — so a bad/forged proof from one node prompts a retry against another server instead of a hard fail (bounded bymax_retries). Previously these failures fast-failed as the non-retryableError::DriveProofError.Checklist:
🤖 Co-authored by Claudius the Magnificent AI Agent