fix: address four peer-reachable security issues#4845
Open
CoderZhi wants to merge 1 commit into
Open
Conversation
Four independent HIGH-severity issues reachable by any peer, fixed together because they all sit on the unauthenticated network surface: nodeinfo: HandleNodeInfo dereferenced msg.Info without a nil check, so broadcasting NODE_INFO with Info=nil crashed the receiving node. Guard msg / msg.Info up front and log + drop the malformed message. actsync: ActionSync.actions was an unbounded sync.Map keyed by hash. A peer flooding forged ACTION_HASH messages with unique hashes drove the map without bound (OOM) and the periodic triggerSync amplified the flood into per-hash unicast requests to neighbors. Cap live entries at cfg.Size via an atomic counter; LoadOrStore / LoadAndDelete refunds the slot on dedup and on receipt. server/itx: the admin mux (/pause, /unpause, /producer-keys, pprof) bound to all interfaces on :HTTPAdminPort. Any host reachable on that port could halt block production with an unauthenticated POST /pause. Bind to 127.0.0.1 only. actpool: the pool is sharded into 16 workers by addr.Bytes()[last]%16. The MaxNumActsPerPool full-check was global but the eviction (worker.accountActs.PopPeek) ran on the receiving worker's local shard. An attacker filling one shard with gapped future-nonce min-fee spam from grinded same-shard accounts forced honest senders in the other 15 shards to evict their own actions while the attacker paid nothing. Add popLowestPriorityAcrossWorkers, which locks all workers in index order and evicts the globally-lowest-priority head, matching accountPriorityQueue.Less. This is the only multi-worker-mu site in the package, so no deadlock cycle can be introduced. Tests: - nodeinfo: nil-msg + nil-Info no-panic case. - actsync: cap respected under flood, dedup must not double-count, concurrent flood respects cap. - actpool: direct global-pop unit tests, headLess truth table mirroring accountPriorityQueue.Less, end-to-end Add cross-shard eviction, ErrTxPoolOverflow when newcomer is globally lowest, concurrent multi-shard stress for deadlock detection. All pass with -race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Four independent HIGH-severity issues reachable by any peer, fixed
together because they all live on the unauthenticated network surface:
HandleNodeInfodereferencedmsg.Infowithout a nilcheck, so any peer broadcasting
NODE_INFOwithInfo=nilcrashedthe receiving node. Guard
msg/msg.Infoup front.ActionSync.actionswas an unboundedsync.Mapkeyed byhash. Flooding forged
ACTION_HASHmessages drove the map to OOM,and the periodic
triggerSyncamplified the flood into per-hashunicast requests to neighbors. Cap live entries at
cfg.Sizevia anatomic counter;
LoadOrStore/LoadAndDeleterefund the slot ondedup and on receipt.
/pause,/unpause,/producer-keys,pprof) bound to all interfaces on
:HTTPAdminPort. Any host reachableon that port could halt block production with an unauthenticated
POST /pause. Bind to127.0.0.1only.addr.Bytes()[last]%16. TheMaxNumActsPerPoolfull-check was globalbut the eviction (
worker.accountActs.PopPeek) ran on the receivingworker's local shard. Filling one shard with gapped future-nonce
min-fee spam from grinded same-shard accounts forced honest senders
in the other 15 shards to evict their own actions while the attacker
paid nothing. Add
popLowestPriorityAcrossWorkers, which locks allworkers in index order and evicts the globally-lowest-priority head,
matching
accountPriorityQueue.Less.Why grouped
Each fix is small, but separating them creates four parallel branches
on the same hardening pass. They were all surfaced together as
peer-reachable HIGH-severity findings. None of them depend on each
other; bisecting is straightforward (each fix is isolated to its own
file/package).
Deadlock analysis (actpool)
popLowestPriorityAcrossWorkersis the only multi-worker.musite inthe package. It acquires all 16 worker mutexes in a fixed index order;
every other call site holds at most a single worker's mutex. Therefore
no cycle can form.
removeInvalidActs(which fires subscriberonRemovedcallbacks) now runs outside the worker-mu region; theprior code held the local worker mutex while firing those callbacks,
so this change is strictly safer for subscriber re-entrancy.
Test plan
go test ./actpool/ ./actsync/ ./nodeinfo/ -race— 107 passgo build ./...— cleannil_msg_and_nil_infono-panic caserespects cap under
-raceheadLesstruth tablemirroring
accountPriorityQueue.Less, end-to-end Add cross-shardeviction,
ErrTxPoolOverflowwhen newcomer is globally lowest,concurrent multi-shard stress for deadlock detection (
-race)Notes for reviewers
endpoints. Anyone using
/ha,/producer-keys, or pprof over thenetwork will need an SSH tunnel or a sidecar proxy instead. Default
config disables the admin port (
HTTPAdminPort: 0), so productionnodes that haven't opted in are unaffected.
all 16 workers. Overflow is the rare path; the alternative (per-shard)
is the bug being fixed.
overshoot
MaxNumActsPerPoolunder concurrency. This is pre-existingbehavior and not introduced by this PR.
🤖 Generated with Claude Code