Skip to content

fix: address four peer-reachable security issues#4845

Open
CoderZhi wants to merge 1 commit into
masterfrom
fix-peer-attack-vectors
Open

fix: address four peer-reachable security issues#4845
CoderZhi wants to merge 1 commit into
masterfrom
fix-peer-attack-vectors

Conversation

@CoderZhi

@CoderZhi CoderZhi commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Four independent HIGH-severity issues reachable by any peer, fixed
together because they all live on the unauthenticated network surface:

  • nodeinfo: HandleNodeInfo dereferenced msg.Info without a nil
    check, so any peer broadcasting NODE_INFO with Info=nil crashed
    the receiving node. Guard msg / msg.Info up front.
  • actsync: ActionSync.actions was an unbounded sync.Map keyed by
    hash. Flooding forged ACTION_HASH messages drove the map to OOM,
    and the periodic triggerSync amplified the flood into per-hash
    unicast requests to neighbors. Cap live entries at cfg.Size via an
    atomic counter; LoadOrStore / LoadAndDelete refund the slot on
    dedup and on receipt.
  • server/itx: the admin mux (/pause, /unpause, /producer-keys,
    pprof) bound to all interfaces on :HTTPAdminPort. Any host reachable
    on that port could halt block production with an unauthenticated
    POST /pause. Bind to 127.0.0.1 only.
  • actpool: the pool is sharded into 16 workers by
    addr.Bytes()[last]%16. The MaxNumActsPerPool full-check was global
    but the eviction (worker.accountActs.PopPeek) ran on the receiving
    worker's local shard. Filling one shard with gapped future-nonce
    min-fee spam from grinded same-shard accounts forced honest senders
    in the other 15 shards to evict their own actions while the attacker
    paid nothing. Add popLowestPriorityAcrossWorkers, which locks all
    workers in index order and evicts the globally-lowest-priority head,
    matching accountPriorityQueue.Less.

Why grouped

Each fix is small, but separating them creates four parallel branches
on the same hardening pass. They were all surfaced together as
peer-reachable HIGH-severity findings. None of them depend on each
other; bisecting is straightforward (each fix is isolated to its own
file/package).

Deadlock analysis (actpool)

popLowestPriorityAcrossWorkers is the only multi-worker.mu site in
the package. It acquires all 16 worker mutexes in a fixed index order;
every other call site holds at most a single worker's mutex. Therefore
no cycle can form. removeInvalidActs (which fires subscriber
onRemoved callbacks) now runs outside the worker-mu region; the
prior code held the local worker mutex while firing those callbacks,
so this change is strictly safer for subscriber re-entrancy.

Test plan

  • go test ./actpool/ ./actsync/ ./nodeinfo/ -race — 107 pass
  • go build ./... — clean
  • nodeinfonil_msg_and_nil_info no-panic case
  • actsync — flood capacity, duplicate dedup, concurrent flood
    respects cap under -race
  • actpool — direct global-pop unit, headLess truth table
    mirroring accountPriorityQueue.Less, end-to-end Add cross-shard
    eviction, ErrTxPoolOverflow when newcomer is globally lowest,
    concurrent multi-shard stress for deadlock detection (-race)
  • CI green

Notes for reviewers

  • The admin-mux change drops external reachability of the admin
    endpoints. Anyone using /ha, /producer-keys, or pprof over the
    network will need an SSH tunnel or a sidecar proxy instead. Default
    config disables the admin port (HTTPAdminPort: 0), so production
    nodes that haven't opted in are unaffected.
  • Under sustained pool overflow the global eviction briefly serializes
    all 16 workers. Overflow is the rare path; the alternative (per-shard)
    is the bug being fixed.
  • The actpool global full-check is read-then-act and can transiently
    overshoot MaxNumActsPerPool under concurrency. This is pre-existing
    behavior and not introduced by this PR.

🤖 Generated with Claude Code

Four independent HIGH-severity issues reachable by any peer, fixed
together because they all sit on the unauthenticated network surface:

nodeinfo: HandleNodeInfo dereferenced msg.Info without a nil check, so
  broadcasting NODE_INFO with Info=nil crashed the receiving node. Guard
  msg / msg.Info up front and log + drop the malformed message.

actsync: ActionSync.actions was an unbounded sync.Map keyed by hash.
  A peer flooding forged ACTION_HASH messages with unique hashes drove
  the map without bound (OOM) and the periodic triggerSync amplified the
  flood into per-hash unicast requests to neighbors. Cap live entries
  at cfg.Size via an atomic counter; LoadOrStore / LoadAndDelete refunds
  the slot on dedup and on receipt.

server/itx: the admin mux (/pause, /unpause, /producer-keys, pprof)
  bound to all interfaces on :HTTPAdminPort. Any host reachable on that
  port could halt block production with an unauthenticated POST /pause.
  Bind to 127.0.0.1 only.

actpool: the pool is sharded into 16 workers by addr.Bytes()[last]%16.
  The MaxNumActsPerPool full-check was global but the eviction
  (worker.accountActs.PopPeek) ran on the receiving worker's local
  shard. An attacker filling one shard with gapped future-nonce min-fee
  spam from grinded same-shard accounts forced honest senders in the
  other 15 shards to evict their own actions while the attacker paid
  nothing. Add popLowestPriorityAcrossWorkers, which locks all workers
  in index order and evicts the globally-lowest-priority head, matching
  accountPriorityQueue.Less. This is the only multi-worker-mu site in
  the package, so no deadlock cycle can be introduced.

Tests:
- nodeinfo: nil-msg + nil-Info no-panic case.
- actsync: cap respected under flood, dedup must not double-count,
  concurrent flood respects cap.
- actpool: direct global-pop unit tests, headLess truth table mirroring
  accountPriorityQueue.Less, end-to-end Add cross-shard eviction,
  ErrTxPoolOverflow when newcomer is globally lowest, concurrent
  multi-shard stress for deadlock detection. All pass with -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CoderZhi CoderZhi requested a review from a team as a code owner June 1, 2026 08:25
@sonarqubecloud

sonarqubecloud Bot commented Jun 1, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
16.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant