fix(consensus): resolve cross-RPC tx freeze — deterministic ordering + derive-at-apply fees + min-shard floor [epic #21]#948
Conversation
…P-ORDER, ep#21] Replace node-variant timestamp ordering of the merged mempool with a deterministic (sender, nonce, hash) total order in mergeAndOrderMempools, fork-gated on nonceEnforcement. Every honest node forging the same merged set now produces a byte-identical ordered_transactions list (vote convergence -> no stall) and same-sender txs apply in nonce order. - New leaf module deterministicOrder.ts (no heavy imports; inlined forge-buffer->hex so it stays unit-testable in isolation). - 5 unit tests (determinism, per-sender nonce order, hash tie-break, no-mutation, total-order antisymmetry). - Verified in devnet: forging node logs the ordering line and finalizes blocks with it active; full multi-node acceptance blocked by a pre-existing devnet fixture identity/genesis desync (task #203). Part of epic #21. Next: P-TRIM (forge-after-apply), the keystone fix. Refs: docs/discoveries/nonce-multirpc-fork-2026-06-25.md
…tating signed tx [#204, ep#21] Root cause of the cross-RPC consensus freeze: confirmTransaction (via applyGasFeeSeparation) mutated the SDK-signed tx AFTER signing — prepended fee-distribution gcr_edits and overwrote tx.content.transaction_fee (rpc_address + reshaped amounts). Both fields are inside the coherence hash, but tx.hash was never recomputed (can't — would break the sender signature). On gossip, every peer ran validateTxCoherence, recomputed the hash from the mutated content, got a mismatch, and rejected the tx as 'not coherent'. Each node then held only its own tx, forged divergent blocks, and the 2/3 vote never converged → no native tx ever committed cross-node → chain frozen under any transaction inflow (the colleague's reported instability). Fix (derive-at-apply): applyGasFeeSeparation no longer mutates the tx — it keeps only the ingress balance pre-check. The gossiped/stored tx is now byte-identical to what the sender signed, so coherence passes on every node. HandleGCR.applyTransactions derives the fee-distribution edits at apply time from each tx's SDK-shipped transaction_fee (deterministic across nodes) and prepends them before prepareEntities/partition/apply, so the fee accounts are cached and grouped correctly. verifyGcrEditsMatch: expectFeeEdits=false at ingress (no fee edits on the wire), true at apply (derived edits present). rpc_address is null at this stage → the rpc-fee share folds into treasury deterministically; per-tx rpc-operator routing via a BFT-committed fee envelope is the mainnet follow-up (does NOT block the testnet RC). Verified on a 2-node devnet (RC config): single tx commits (was frozen); 3 sequential cross-RPC txs all land in nonce order; both nodes stay byte-identical (no divergence); fees collected to treasury; same-tx replay to both RPCs yields no double-spend. typecheck: 0 new errors. Part of epic #21. Devnet genesis regenerated for 2 nodes; adds the multi_rpc_sequential_nonce devnet driver. Refs: docs/discoveries/nonce-multirpc-fork-2026-06-25.md
…[P-MINSHARD #197, ep#21] P-MINSHARD (#197): isBlockValid refuses to finalize when totalVotes < MIN_SHARD (=2). A 1-node shard otherwise self-certifies (BFT threshold floor(2/3)+1 = 1), which is the solo-fork source (Bug B): a lone node forges, rejoins, and its divergent block carries a 'valid' signature count nobody endorsed. Below the floor the chain correctly STALLS (safety over liveness) rather than forking. 2 = testnet RC minimum (both nodes must agree). Determinism (part of P-CLOCK #195): getShard sorted the peer list with localeCompare before seeded sampling — locale-sensitive, so two nodes under different host locales could order peers differently and select divergent shards (latent consensus split). Switched to plain byte comparison on the ASCII hex identities (stable total order on every node). Verified on 2-node devnet: chain advances (N=2 meets the floor, pro=2/con=0), tx still commits, no regression. typecheck: 0 new errors. Part of epic #21.
…-root-vote follow-up scoped
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 30 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (11)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The consensus fixes in this PR were built and verified against demosdk 4.0.12. Pin it so CI and other nodes build against the same SDK the fix was validated on.
|
@greptile review |
Greptile SummaryThis PR updates consensus handling for cross-RPC native transactions. The main changes are:
Confidence Score: 5/5The consensus changes are focused and include targeted coverage for deterministic transaction ordering and two-node devnet behavior. No blocking correctness issues were identified in the reviewed changes, and the implementation aligns with the stated goal of preventing cross-RPC transaction divergence.
What T-Rex did
Reviews (8): Last reviewed commit: "fix: per-component fee binding + correct..." | Re-trigger Greptile |
Greptile SummaryThis PR fixes cross-RPC transaction freezing by keeping signed transactions stable and making consensus ordering deterministic. The main changes are:
Confidence Score: 4/5The consensus changes are focused and backed by deterministic-ordering tests and devnet verification assets, with one devnet helper script issue needing cleanup. The core transaction-stability approach is coherent and the risky mutation path is addressed, but the included multi-RPC verification script currently does not match the documented two-node devnet setup. testing/devnet/scripts/multi_rpc_sequential_nonce.mjs
What T-Rex did
Reviews (2): Last reviewed commit: "chore(deps): bump @kynesyslabs/demosdk 4..." | Re-trigger Greptile |
…ipt [ep#21]
P1 (fee bypass): applyGasFeeSeparation now BINDS the SDK-shipped
transaction_fee to the node-computed breakdown — rejects any tx whose shipped
fee TOTAL != computed total (in OS). Without this, since the fee is charged at
apply from the shipped fields, a client could ship {0,0,0} and pay nothing.
Bound on TOTAL (not per-component): the sender commits to the total, the node
owns the network/rpc/additional split. deriveFeeEditsForApply derives from the
shipped transaction_fee — the SAME source verifyGcrEditsMatch regenerates from
at apply — so injected edits and the binding check agree (a breakdown-based
derive diverged from the shipped-based regen and false-rejected every tx).
P2: multi_rpc_sequential_nonce.mjs now requires >=2 RPCs (NODE3_URL optional),
matching the documented 2-validator RC devnet, instead of hard-failing without
a 3rd.
Verified on 2-node devnet: legit tx still commits; fee bound at ingress.
Part of epic #21 / PR #948.
|
@greptile review |
|
@greptile review |
…ile T-Rex findings) [ep#21] P1 (fee distribution correctness): applyGasFeeSeparation now binds the shipped transaction_fee to the node-computed breakdown PER-COMPONENT (network/rpc/ additional), not just on total. Total-only binding let a client skew the split (e.g. whole fee in network_fee, rpc_fee=0) and pass validation while apply emitted no RPC-fee block -> deterministically wrong distribution. Per-component binding rejects skew; legit txs (SDK ships the same split the node computes) pass. Consistent with deriveFeeEditsForApply, which charges from the same shipped components. P2 (verify script): sample() sliced infos by N (tx count) instead of clients.length (RPC count), mixing sender/receiver objects when N != client count (2-RPC devnet, N=3). Slice by clients.length. Verified on 2-node devnet: 'all 3 sequential-nonce txs landed in order across RPCs' — binding passes legit txs, slice reports correctly. Part of epic #21 / PR #948.
|
@greptile review |
|
All Greptile findings addressed and verified on the 2-node devnet:
The latest Greptile pass re-emitted the same three comments against the new commit, but the code at those exact lines already contains the fixes (verified). Treating them as stale re-flags; threads resolved. |
Summary
Fixes the consensus instability where transactions were accepted at RPC but never committed cross-node, freezing the chain under transaction inflow (manual node restarts / shard fiddling required). Verified end-to-end on a 2-node devnet (the testnet RC config).
Epic 21. This PR is the stability mandate; the hardening mandate (state-root vote) is a scoped follow-up (see below).
Root cause
confirmTransaction(viaapplyGasFeeSeparation, gasFeeSeparation fork) mutated the SDK-signed tx after signing — prepended fee-distributiongcr_editsand overwrotetx.content.transaction_fee(stampedrpc_address, reshaped amounts). Both are inside the coherence hash, buttx.hashwas never recomputed (can't — would break the sender signature). On gossip, every peer ranvalidateTxCoherence, recomputed the hash from the mutated content, got a mismatch, and rejected the tx as "not coherent". Each node then held only its own tx, forged divergent blocks, the 2/3 vote never converged → no native tx ever committed.Changes
c831ce0c8) — deterministic(sender,nonce,hash)mempool ordering, fork-gated onnonceEnforcement. Every honest node forges a byte-identicalordered_transactions→ vote convergence; same-sender txs apply in nonce order.b07f5e3bf) —applyGasFeeSeparationno longer mutates the signed tx (keeps only the ingress balance pre-check). Fee-distribution edits are derived at apply from each tx's SDK-shippedtransaction_fee(deterministic across nodes), prepended beforeprepareEntities/partition.verifyGcrEditsMatch:expectFeeEdits=falseat ingress,trueat apply. Gossiped tx is byte-identical to the signed one → coherence passes.fcf6feb6e) —isBlockValidrefuses to finalize belowMIN_SHARD=2(a 1-node shard otherwise self-certifies: BFT thresholdfloor(2/3)+1=1→ solo-fork source). Below the floor the chain stalls (safety over liveness). +getShardpeer sortlocaleCompare→byte comparison (locale-sensitive ordering was a latent shard-divergence source).Verification (2-node devnet, RC config)
pro=2/con=0; N=2 meets the MIN_SHARD floorNOT in this PR — scoped follow-up (hardening mandate)
P-TRIM+P-DETECT+P-CLOCK+P-ADMITform one coupled state-root-vote redesign (forge→apply→seal-post-apply→vote-on-post-apply-hash, + full execution determinism). High blast radius; deferred to its own branch with its own plan + adversarial pass. With P-ORDER + 204, sequential txs already commit, so these are hardening/throughput, not stability.Notes for reviewers
rpc_addressis intentionally null at this stage → the rpc-fee share folds into treasury deterministically. Per-tx rpc-operator routing via a BFT-committed fee envelope is part of the follow-up.MIN_SHARDis a module constant (=2) so all nodes agree; should become a genesis/fork parameter for mainnet.4.0.9 → 4.0.12bump inpackage.json/bun.lockis a separate pending change (not in this PR's commits). The fix was built/tested against 4.0.12.Refs:
docs/discoveries/nonce-multirpc-fork-2026-06-25.md