Skip to content

fix(consensus): resolve cross-RPC tx freeze — deterministic ordering + derive-at-apply fees + min-shard floor [epic #21]#948

Merged
tcsenpai merged 8 commits into
stabilisationfrom
fix/consensus-deterministic-ordering-ep21
Jun 25, 2026
Merged

fix(consensus): resolve cross-RPC tx freeze — deterministic ordering + derive-at-apply fees + min-shard floor [epic #21]#948
tcsenpai merged 8 commits into
stabilisationfrom
fix/consensus-deterministic-ordering-ep21

Conversation

@tcsenpai

@tcsenpai tcsenpai commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the consensus instability where transactions were accepted at RPC but never committed cross-node, freezing the chain under transaction inflow (manual node restarts / shard fiddling required). Verified end-to-end on a 2-node devnet (the testnet RC config).

Epic 21. This PR is the stability mandate; the hardening mandate (state-root vote) is a scoped follow-up (see below).

Root cause

confirmTransaction (via applyGasFeeSeparation, gasFeeSeparation fork) mutated the SDK-signed tx after signing — prepended fee-distribution gcr_edits and overwrote tx.content.transaction_fee (stamped rpc_address, reshaped amounts). Both are inside the coherence hash, but tx.hash was never recomputed (can't — would break the sender signature). On gossip, every peer ran validateTxCoherence, recomputed the hash from the mutated content, got a mismatch, and rejected the tx as "not coherent". Each node then held only its own tx, forged divergent blocks, the 2/3 vote never converged → no native tx ever committed.

Changes

  • P-ORDER (c831ce0c8) — deterministic (sender,nonce,hash) mempool ordering, fork-gated on nonceEnforcement. Every honest node forges a byte-identical ordered_transactions → vote convergence; same-sender txs apply in nonce order.
  • derive-at-apply (b07f5e3bf) — applyGasFeeSeparation no longer mutates the signed tx (keeps only the ingress balance pre-check). Fee-distribution edits are derived at apply from each tx's SDK-shipped transaction_fee (deterministic across nodes), prepended before prepareEntities/partition. verifyGcrEditsMatch: expectFeeEdits=false at ingress, true at apply. Gossiped tx is byte-identical to the signed one → coherence passes.
  • P-MINSHARD (fcf6feb6e) — isBlockValid refuses to finalize below MIN_SHARD=2 (a 1-node shard otherwise self-certifies: BFT threshold floor(2/3)+1=1 → solo-fork source). Below the floor the chain stalls (safety over liveness). + getShard peer sort localeCompare→byte comparison (locale-sensitive ordering was a latent shard-divergence source).

Verification (2-node devnet, RC config)

  • ✅ single tx commits (was frozen)
  • ✅ 4 sequential cross-RPC txs land in nonce order
  • ✅ both nodes byte-identical (equal heights, nonce, balances)
  • ✅ fees collected to treasury
  • ✅ same-tx replay to both RPCs → no double-spend
  • ✅ chain advances pro=2/con=0; N=2 meets the MIN_SHARD floor
  • ✅ typecheck: 0 new errors (16 pre-existing on base, unrelated)

NOT in this PR — scoped follow-up (hardening mandate)

P-TRIM + P-DETECT + P-CLOCK + P-ADMIT form one coupled state-root-vote redesign (forge→apply→seal-post-apply→vote-on-post-apply-hash, + full execution determinism). High blast radius; deferred to its own branch with its own plan + adversarial pass. With P-ORDER + 204, sequential txs already commit, so these are hardening/throughput, not stability.

Notes for reviewers

  • rpc_address is intentionally null at this stage → the rpc-fee share folds into treasury deterministically. Per-tx rpc-operator routing via a BFT-committed fee envelope is part of the follow-up.
  • MIN_SHARD is a module constant (=2) so all nodes agree; should become a genesis/fork parameter for mainnet.
  • The demosdk 4.0.9 → 4.0.12 bump in package.json/bun.lock is a separate pending change (not in this PR's commits). The fix was built/tested against 4.0.12.

Refs: docs/discoveries/nonce-multirpc-fork-2026-06-25.md

tcsenpai added 4 commits June 25, 2026 16:53
…P-ORDER, ep#21]

Replace node-variant timestamp ordering of the merged mempool with a
deterministic (sender, nonce, hash) total order in mergeAndOrderMempools,
fork-gated on nonceEnforcement. Every honest node forging the same merged
set now produces a byte-identical ordered_transactions list (vote
convergence -> no stall) and same-sender txs apply in nonce order.

- New leaf module deterministicOrder.ts (no heavy imports; inlined
  forge-buffer->hex so it stays unit-testable in isolation).
- 5 unit tests (determinism, per-sender nonce order, hash tie-break,
  no-mutation, total-order antisymmetry).
- Verified in devnet: forging node logs the ordering line and finalizes
  blocks with it active; full multi-node acceptance blocked by a
  pre-existing devnet fixture identity/genesis desync (task #203).

Part of epic #21. Next: P-TRIM (forge-after-apply), the keystone fix.

Refs: docs/discoveries/nonce-multirpc-fork-2026-06-25.md
…tating signed tx [#204, ep#21]

Root cause of the cross-RPC consensus freeze: confirmTransaction (via
applyGasFeeSeparation) mutated the SDK-signed tx AFTER signing — prepended
fee-distribution gcr_edits and overwrote tx.content.transaction_fee
(rpc_address + reshaped amounts). Both fields are inside the coherence hash,
but tx.hash was never recomputed (can't — would break the sender signature).
On gossip, every peer ran validateTxCoherence, recomputed the hash from the
mutated content, got a mismatch, and rejected the tx as 'not coherent'. Each
node then held only its own tx, forged divergent blocks, and the 2/3 vote
never converged → no native tx ever committed cross-node → chain frozen under
any transaction inflow (the colleague's reported instability).

Fix (derive-at-apply): applyGasFeeSeparation no longer mutates the tx — it
keeps only the ingress balance pre-check. The gossiped/stored tx is now
byte-identical to what the sender signed, so coherence passes on every node.
HandleGCR.applyTransactions derives the fee-distribution edits at apply time
from each tx's SDK-shipped transaction_fee (deterministic across nodes) and
prepends them before prepareEntities/partition/apply, so the fee accounts are
cached and grouped correctly. verifyGcrEditsMatch: expectFeeEdits=false at
ingress (no fee edits on the wire), true at apply (derived edits present).

rpc_address is null at this stage → the rpc-fee share folds into treasury
deterministically; per-tx rpc-operator routing via a BFT-committed fee
envelope is the mainnet follow-up (does NOT block the testnet RC).

Verified on a 2-node devnet (RC config): single tx commits (was frozen);
3 sequential cross-RPC txs all land in nonce order; both nodes stay
byte-identical (no divergence); fees collected to treasury; same-tx replay to
both RPCs yields no double-spend. typecheck: 0 new errors.

Part of epic #21. Devnet genesis regenerated for 2 nodes; adds the
multi_rpc_sequential_nonce devnet driver.

Refs: docs/discoveries/nonce-multirpc-fork-2026-06-25.md
…[P-MINSHARD #197, ep#21]

P-MINSHARD (#197): isBlockValid refuses to finalize when totalVotes <
MIN_SHARD (=2). A 1-node shard otherwise self-certifies (BFT threshold
floor(2/3)+1 = 1), which is the solo-fork source (Bug B): a lone node forges,
rejoins, and its divergent block carries a 'valid' signature count nobody
endorsed. Below the floor the chain correctly STALLS (safety over liveness)
rather than forking. 2 = testnet RC minimum (both nodes must agree).

Determinism (part of P-CLOCK #195): getShard sorted the peer list with
localeCompare before seeded sampling — locale-sensitive, so two nodes under
different host locales could order peers differently and select divergent
shards (latent consensus split). Switched to plain byte comparison on the
ASCII hex identities (stable total order on every node).

Verified on 2-node devnet: chain advances (N=2 meets the floor, pro=2/con=0),
tx still commits, no regression. typecheck: 0 new errors.

Part of epic #21.
@qodo-code-review

Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@tcsenpai, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 30 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c58a3a7f-d5a0-42fa-94c1-101dd31a07b8

📥 Commits

Reviewing files that changed from the base of the PR and between b0a1fa6 and b17ee66.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • docs/discoveries/nonce-multirpc-fork-2026-06-25.md
  • package.json
  • src/libs/blockchain/gcr/handleGCR.ts
  • src/libs/blockchain/mempool.ts
  • src/libs/blockchain/routines/applyGasFeeSeparation.ts
  • src/libs/consensus/v2/PoRBFT.ts
  • src/libs/consensus/v2/routines/deterministicOrder.test.ts
  • src/libs/consensus/v2/routines/deterministicOrder.ts
  • src/libs/consensus/v2/routines/getShard.ts
  • testing/devnet/genesis.devnet.json
  • testing/devnet/scripts/multi_rpc_sequential_nonce.mjs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/consensus-deterministic-ordering-ep21

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

The consensus fixes in this PR were built and verified against demosdk
4.0.12. Pin it so CI and other nodes build against the same SDK the
fix was validated on.
@tcsenpai

Copy link
Copy Markdown
Contributor Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Greptile Summary

This PR updates consensus handling for cross-RPC native transactions. The main changes are:

  • Deterministic mempool ordering by sender, nonce, and hash.
  • Gas-fee edits derived at apply time instead of mutating signed transactions at ingress.
  • Ingress and apply-time GCR edit verification split by whether fee edits are expected.
  • Minimum shard participation floor before block finalization.
  • Locale-independent peer sorting for shard selection.
  • Two-node devnet fixture and multi-RPC sequential nonce verification script updates.
  • @kynesyslabs/demosdk dependency bump to 4.0.12.

Confidence Score: 5/5

The consensus changes are focused and include targeted coverage for deterministic transaction ordering and two-node devnet behavior.

No blocking correctness issues were identified in the reviewed changes, and the implementation aligns with the stated goal of preventing cross-RPC transaction divergence.

T-Rex T-Rex Logs

What T-Rex did

  • The cross-RPC stability check was attempted, and base and head captures reveal docker is missing, blocking docker-compose from starting the 2-node devnet required for cross-RPC submission.
  • The confirm-time mutation experiment was performed, showing that the base state had rpc_address evolve and gcr_edits grow, with coherenceAfterConfirm invalid, while the head state kept rpc_address null, gcr_edits unchanged, and coherenceAfterConfirm valid; apply-time fee derivation also appeared with eight edits and treasury adjustments.
  • The deterministic order test ran, capturing base and head runs; the base showed divergent outputs and ordered_hashes_equal=false, while the head run produced identical ordered outputs with ordered_hashes_equal=true and the scenario path deterministically applied.
  • The min shard test ran, showing a base state without MIN_SHARD returning valid for pro=1,totalVotes=1, and an after state with MIN_SHARD=2 producing invalid results for some combinations and valid for pro=2,totalVotes=2; the sorting path shifted to a byte-order comparator aligned with the environment’s locale ordering.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (8): Last reviewed commit: "fix: per-component fee binding + correct..." | Re-trigger Greptile

Comment thread src/libs/blockchain/routines/applyGasFeeSeparation.ts Outdated
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes cross-RPC transaction freezing by keeping signed transactions stable and making consensus ordering deterministic. The main changes are:

  • Derive gas-fee distribution edits during apply instead of mutating signed transactions at confirmation.
  • Verify ingress transactions without expecting derived fee edits, while apply-time checks still bind the derived edits.
  • Sort merged mempools by sender, nonce, and hash behind the nonce enforcement fork.
  • Reject block finalization below a two-validator shard floor.
  • Replace locale-sensitive shard peer sorting with byte-stable identity ordering.
  • Add deterministic ordering tests and devnet verification assets.

Confidence Score: 4/5

The consensus changes are focused and backed by deterministic-ordering tests and devnet verification assets, with one devnet helper script issue needing cleanup.

The core transaction-stability approach is coherent and the risky mutation path is addressed, but the included multi-RPC verification script currently does not match the documented two-node devnet setup.

testing/devnet/scripts/multi_rpc_sequential_nonce.mjs

T-Rex T-Rex Logs

What T-Rex did

  • Imported the real deterministicOrder.ts orderDeterministically function and verified that all three permutations produced the same sender, nonce, and hash order.
  • Compared the derive-at-apply-fees results before and after the change and documented that the after state shows valid transaction coherence with the expected ledger changes.
  • Updated the min shard floor logic to introduce MIN_SHARD and observed the head MIN_SHARD=2; pro=1,totalVotes=1 now false, while pro=2,totalVotes=2 remains true.
  • Changed the shard-byte sort to use a byte comparator instead of localeCompare in the head branch.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (2): Last reviewed commit: "chore(deps): bump @kynesyslabs/demosdk 4..." | Re-trigger Greptile

Comment thread testing/devnet/scripts/multi_rpc_sequential_nonce.mjs
…ipt [ep#21]

P1 (fee bypass): applyGasFeeSeparation now BINDS the SDK-shipped
transaction_fee to the node-computed breakdown — rejects any tx whose shipped
fee TOTAL != computed total (in OS). Without this, since the fee is charged at
apply from the shipped fields, a client could ship {0,0,0} and pay nothing.
Bound on TOTAL (not per-component): the sender commits to the total, the node
owns the network/rpc/additional split. deriveFeeEditsForApply derives from the
shipped transaction_fee — the SAME source verifyGcrEditsMatch regenerates from
at apply — so injected edits and the binding check agree (a breakdown-based
derive diverged from the shipped-based regen and false-rejected every tx).

P2: multi_rpc_sequential_nonce.mjs now requires >=2 RPCs (NODE3_URL optional),
matching the documented 2-validator RC devnet, instead of hard-failing without
a 3rd.

Verified on 2-node devnet: legit tx still commits; fee bound at ingress.

Part of epic #21 / PR #948.
@tcsenpai

Copy link
Copy Markdown
Contributor Author

@greptile review

Comment thread testing/devnet/scripts/multi_rpc_sequential_nonce.mjs
@tcsenpai

Copy link
Copy Markdown
Contributor Author

@greptile review

Comment thread src/libs/blockchain/gcr/handleGCR.ts
Comment thread src/libs/blockchain/routines/applyGasFeeSeparation.ts Outdated
…ile T-Rex findings) [ep#21]

P1 (fee distribution correctness): applyGasFeeSeparation now binds the shipped
transaction_fee to the node-computed breakdown PER-COMPONENT (network/rpc/
additional), not just on total. Total-only binding let a client skew the split
(e.g. whole fee in network_fee, rpc_fee=0) and pass validation while apply
emitted no RPC-fee block -> deterministically wrong distribution. Per-component
binding rejects skew; legit txs (SDK ships the same split the node computes)
pass. Consistent with deriveFeeEditsForApply, which charges from the same
shipped components.

P2 (verify script): sample() sliced infos by N (tx count) instead of
clients.length (RPC count), mixing sender/receiver objects when N != client
count (2-RPC devnet, N=3). Slice by clients.length.

Verified on 2-node devnet: 'all 3 sequential-nonce txs landed in order across
RPCs' — binding passes legit txs, slice reports correctly.

Part of epic #21 / PR #948.
@tcsenpai

Copy link
Copy Markdown
Contributor Author

@greptile review

@tcsenpai

Copy link
Copy Markdown
Contributor Author

All Greptile findings addressed and verified on the 2-node devnet:

  • P1 fee bypass / distributionapplyGasFeeSeparation now binds the shipped transaction_fee to the node-computed breakdown per-component (commit b17ee66); a skewed or underpaid split is rejected. deriveFeeEditsForApply charges from the same shipped components as verifyGcrEditsMatch regenerates.
  • P2 script needs 3 RPCs — fixed: NODE_URLS is .filter(Boolean), requires >=2 (commit 6bab5cf).
  • P2 sample slicing — fixed: slice(0, clients.length) / slice(clients.length) (commit b17ee66).

The latest Greptile pass re-emitted the same three comments against the new commit, but the code at those exact lines already contains the fixes (verified). Treating them as stale re-flags; threads resolved.

@tcsenpai tcsenpai merged commit 01ef0a0 into stabilisation Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant