Skip to content

fix(staking): BLS proof-of-possession at candidate register/update (CRITICAL, pre-IIP-52)#4854

Open
envestcc wants to merge 9 commits into
iotexproject:masterfrom
envestcc:bls-pop-fix
Open

fix(staking): BLS proof-of-possession at candidate register/update (CRITICAL, pre-IIP-52)#4854
envestcc wants to merge 9 commits into
iotexproject:masterfrom
envestcc:bls-pop-fix

Conversation

@envestcc

Copy link
Copy Markdown
Member

Security severity: CRITICAL, latent. Replaces #4853 — this PR is based directly on master, NOT stacked on the IIP-52 PR chain. The vulnerable registration path (`crypto.BLS12381PublicKeyFromBytes` only, no PoP) is already live on master; IIP-52 turns the latent bug into a live exploit, but the un-attested BLS pubkeys can be collected starting now.

Depends on iotex-proto#173 (adds the `blsPop` field on `CandidateBasicInfo`).

Why land BEFORE IIP-52

Master today has the full BLS registration infrastructure: `Candidate.BlsPubKey` proto field 5, `candidateRegisterWithBLS` ABI method, `crypto.BLS12381PublicKeyFromBytes` (format + subgroup check only, no PoP). Every BLS pubkey that gets registered between now and the IIP-52 activation block is a potential attack vector — once `FastAggregateVerify` goes live, all un-attested pubkeys become exploitable.

Activating `EnforceBLSPoP` BEFORE the IIP-52 fork means:

  • New registrations from this PR onward all carry verified PoPs
  • The pre-fork migration scope shrinks to existing pubkeys registered before this PR lands
  • IIP-52 doesn't have to ship the PoP check — it just relies on the gate already being in place

The bug

IIP-52's footer aggregation calls

```go
crypto.BLSAggregateSignatureFromBytes(sig).Verify(pubKeys, msg)
// → blst.FastAggregateVerify(true, pubKeys, msg, dst)
```

with DST `BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_`. The `POP` suffix names the IRTF Proof-of-Possession ciphersuite, sound only if every registered pubkey arrived with a possession proof.

Today `action/candidate_register.go` and `action/candidate_update.go` validate only:

```go
_, err := crypto.BLS12381PublicKeyFromBytes(blsPubKey) // format + subgroup
```

No PoP. The proto has no PoP field.

Attack (becomes exploitable when IIP-52 activates)

A single active delegate:

  1. Reads other delegates' BLS pubkeys from chain
  2. Picks any secret `x` they control
  3. Computes `pk_rogue = g^x − Σ(other pubkeys)` in G1 (public information only)
  4. Registers `pk_rogue` via `candidateRegisterWithBLS` — current handler accepts
  5. After IIP-52 activates, submits a footer with all-ones bitmap and `σ = sign(x, msg)`
  6. `FastAggregateVerify` collapses `Σ pk_i + pk_rogue` to `g^x`. `σ` verifies. `isMajorityCount` passes.

One delegate forges a 2/3+ quorum certificate → finalizes arbitrary blocks → consensus safety break.

The fix (Eth2 `process_deposit` pattern)

```go
// action/protocol/staking/bls_pop.go
const blsPopDomain = "IOTEX_BLS_POP_v1"

func BLSPopSigningRoot(blsPubKey []byte, ownerAddress address.Address) []byte {
h := sha256.New()
h.Write([]byte(blsPopDomain))
h.Write(blsPubKey)
h.Write(ownerAddress.Bytes())
return h.Sum(nil)
}
```

Three values bind into the signed message:

Binding Closes
`blsPubKey` Rogue key — attacker without the secret cannot sign the canonical message
`ownerAddress` Candidate substitution — PoP for owner A doesn't validate for owner B
`IOTEX_BLS_POP_v1` prefix Cross-domain replay — PoP disjoint from any other BLS-signed iotex message; version suffix reserves room for a future fork to rotate

Implementation

  • Proto (iotex-proto#173): `bytes blsPop = 5` on `CandidateBasicInfo`. go.mod replace updated to envestcc/iotex-proto@7a486f6.

  • FeatureCtx.EnforceBLSPoP: `g.IsToBeEnabled(height)`. Standalone flag — does NOT share state with any IIP-52 flag, so this PR is fully independent.

  • action.CandidateRegister / CandidateUpdate: carry `blsPop` through constructors, Proto/LoadProto, accessors. `NewCandidateRegisterWithBLS` and `NewCandidateUpdateWithBLS` now take a `blsPop` parameter; pre-fork callers may pass nil.

  • handlers.go: `handleCandidateRegister`, `handleCandidateUpdate`, and `handleCandidateUpdateByOperator` call `VerifyBLSPop` before writing `Candidate.BLSPubKey` when `EnforceBLSPoP` is active. Failure → `ReceiptStatus_ErrUnauthorizedOperator`.

  • bls_pop_test.go: 5 cases including `TestBLSPop_RogueKeyAttackBlocked` as the regression guard — models the attacker registering a pubkey for which they don't have the secret; demonstrates no PoP they can produce (under any secret they control) validates against that pubkey; control case confirms a legitimate registrant succeeds.

Migration (out of scope for this PR)

Existing pre-PoP-merge candidate records have no PoP. Before `EnforceBLSPoP` activates, all active candidates must either:

  • Submit a `candidateUpdate` carrying a valid `blsPop` (grace period), or
  • Be dropped from `ActiveCandidates` at the fork height via a state migration

This is a coordination problem (talking to delegates, picking a grace window), not a code problem. IIP draft will spell it out.

Out of scope for this PR (follow-ups)

  • ioctl / SDK wiring: `stake2register.go` and `stake2update.go` pass nil for `blsPop` with TODO markers. They need a flag to ingest a BLS private key and derive the PoP. Pre-fork accepted; post-fork rejected.

  • Dedicated receipt status code: reusing `ErrUnauthorizedOperator` (closest semantic match) to keep proto-change scope minimal. Adding `ErrInvalidBLSPoP` is another proto bump and not necessary for correctness.

Test plan

  • `go test ./action/protocol/staking/ ./action/` — passing locally (470 cases including 5 new PoP tests)
  • `go build ./...` clean
  • Rogue-key regression guard: `TestBLSPop_RogueKeyAttackBlocked` ✓
  • CI green on master + iotex-proto#173
  • Coordinate migration plan with delegates before flipping fork height

🤖 Generated with Claude Code

envestcc and others added 2 commits June 11, 2026 10:14
…pdate

CRITICAL security fix for the BLS rogue-key aggregate-forgery attack
against IIP-52's same-message FastAggregateVerify path.

IIP-52 verifies the COMMIT-vote BLS aggregate signature against the sum
of registered candidate BLS public keys:

    blstAggregateSignature.FastAggregateVerify(true, Σ pk_i, msg, dst)

with dst = "BLS_SIG_BLS12381G2_XMD:SHA-256_SSWU_RO_POP_". The _POP_
suffix names the IRTF "Proof of Possession" ciphersuite, which is only
sound when each registered pubkey has been accompanied at registration
time by a signature proving knowledge of the corresponding secret key.

Today registration validates only key format + subgroup membership:

    crypto.BLS12381PublicKeyFromBytes(blsPubKey)

No PoP is required. A registered candidate can publish

    pk_rogue = g^x − Σ(other delegates' pubkeys)

computed from public information, supply a single signature
σ = sign(x, msg) and a bitmap that claims every delegate signed, and
FastAggregateVerify collapses Σ pk_i + pk_rogue to g^x, accepting σ.
The bitmap satisfies isMajorityCount, so one delegate forges a 2/3+
quorum certificate and the consensus safety property is broken.

The window is latent: IIP-52's ToBeEnabledBlockHeight = math.MaxUint64
so this code path is not live on any chain yet. It must be fixed
before activation.

- Add a 96-byte blsPop field to CandidateBasicInfo proto (iotex-proto
  PR pushed; go.mod replace updated to envestcc/iotex-proto@7a486f6)

- New action/protocol/staking/bls_pop.go defines the PoP signing root:

    BLSPopSigningRoot(blsPubKey, ownerAddress) =
        SHA-256("IOTEX_BLS_POP_v1" || blsPubKey || ownerAddress.Bytes())

  Binding all three values blocks the three replay variants:
  (a) attacker without the secret cannot sign over the canonical
      message at all — closes rogue key;
  (b) PoP for owner A does not validate for owner B — closes
      candidate-substitution replay;
  (c) "IOTEX_BLS_POP_v1" domain prefix is disjoint from any other BLS
      DST iotex uses, so a PoP cannot be replayed as a consensus vote
      or vice versa, and a future fork can rotate via "v2".

- New FeatureCtx.EnforceBLSPoP shares IsToBeEnabled height with
  EnableBLSAggregation today but is a semantically distinct switch so
  the two can diverge if a hotfix or fork-height adjustment is needed.

- action.CandidateRegister and action.CandidateUpdate carry blsPop
  through their constructors, Proto/LoadProto, and accessors.
  NewCandidateRegisterWithBLS and NewCandidateUpdateWithBLS now accept
  a blsPop parameter; pre-fork callers may pass nil.

- handleCandidateRegister, handleCandidateUpdate, and
  handleCandidateUpdateByOperator call VerifyBLSPop before writing
  Candidate.BLSPubKey when EnforceBLSPoP is active. Failure returns
  ReceiptStatus_ErrUnauthorizedOperator.

- TestBLSPop_RogueKeyAttackBlocked is the regression guard: it models
  the attacker trying to register a pubkey for which they do not have
  the secret, demonstrates the only PoP they can produce (signed under
  their own secret) does NOT validate against the target pubkey, and
  confirms the control case (a legitimate registrant can produce a
  valid PoP).

- ioctl/SDK wiring: stake2register.go and stake2update.go currently
  pass nil for blsPop with TODO markers. These need a flag to ingest a
  BLS private key and derive the PoP. Pre-fork the empty PoP is
  accepted; post-fork the same call will fail with the new check, so
  the tooling MUST be updated before the fork.

- Migration: existing pre-fork candidate records have no PoP. Before
  the fork activates, all active candidates must submit a
  candidateUpdate carrying a valid blsPop; the alternative is a
  fork-block state migration that drops BLS pubkeys without
  accompanying PoPs from ActiveCandidates. IIP draft will document
  the chosen approach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the BLS proof-of-possession fix. Even with PoP in place,
nothing in the current handler stops two different candidates from
registering with the same blsPubKey: the existing ContainsName /
ContainsOwner / ContainsOperator checks have no BLS equivalent.

Two delegates sharing one BLS pubkey breaks IIP-52's quorum-counting
model. The signer bitmap counts both delegates as having voted, but
FastAggregateVerify aggregates pubkeys as a set — one contribution per
distinct pubkey. The committee size as seen by aggregation is N, the
committee size as seen by the bitmap is N+1, and the second delegate's
stake-weight effectively "votes for free." This is a quieter cousin of
the rogue-key attack: the aggregation math still verifies; only the
accounting is off.

Adds CandidateCenter.ContainsBLSPubKey(blsPubKey, except) and surfaces
it through CandidateStateManager. Implementation is a linear scan over
candidates — registration / update are sparse calls and the active
delegate set is bounded; the saved O(1) lookup is not worth maintaining
a fourth index map across the change/base commit flow.

Hooks into all three handler paths under the EnforceBLSPoP gate:

- handleCandidateRegister: reject if blsPubKey is held by any
  candidate other than the incumbent (when re-registering against an
  existing owner-without-selfstake record).
- handleCandidateUpdate: reject if blsPubKey is held by any candidate
  other than the one being updated. A candidate keeping its own
  pubkey across updates is allowed (except = c.GetIdentifier()).
- handleCandidateUpdateByOperator: same as update-by-owner.

Failures return ReceiptStatus_ErrCandidateConflict (matches existing
collision semantics for name / operator). Gated by EnforceBLSPoP so
pre-fork blocks replay unchanged.

Test: TestCandidateCenter_ContainsBLSPubKey covers nil / empty / no-match
/ match-with-nil-except / match-against-self (allowed) / match-against-
other (rejected).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed a companion commit 4d5389792 adding BLS pubkey uniqueness at register/update.

Why this belongs in the same PR

Even with PoP in place, two candidates registering the SAME blsPubKey breaks IIP-52's quorum-counting model. PoP only proves "I know the secret for this pubkey" — it does not prevent two delegates from independently producing valid PoPs for the same key (e.g. an actively shared keypair, or a delegate selling/leasing their secret).

When IIP-52 footer aggregation runs:

  • The signer bitmap counts each delegate as 1 — committee size = N+1
  • FastAggregateVerify sums pubkeys as a SET — committee size = N (duplicate dropped)
  • isMajorityCount(N+1) may pass even when only N distinct keys actually signed → quieter cousin of the rogue-key attack: aggregation math still verifies, accounting is off, second delegate's stake-weight votes for free

Mitigation matches the existing collision-check pattern: csm.ContainsBLSPubKey(blsPubKey, except) returns true if any candidate other than `except` already holds the pubkey, parallel to `ContainsName` / `ContainsOperator`.

Implementation

  • `CandidateCenter.ContainsBLSPubKey` — linear scan over `m.All()`. Registration is sparse and active-set is bounded; not worth maintaining a 4th index map across the change/base commit flow.
  • `CandidateStateManager` interface adds the new method; `candSM` delegates to `candCenter`.
  • All 3 handler paths (`handleCandidateRegister`, `handleCandidateUpdate`, `handleCandidateUpdateByOperator`) call it after PoP verification, returning `ErrCandidateConflict` on collision.
  • Gated under the same `EnforceBLSPoP` feature flag — pre-fork replay unchanged.
  • Update paths exempt the candidate-being-updated (`except = c.GetIdentifier()`) so re-registering / updating with one's own pubkey is allowed (no-op pubkey case).

Tests

  • `TestCandidateCenter_ContainsBLSPubKey` covers nil / empty / no-match / match-with-nil-except / match-against-self (allowed) / match-against-other (rejected).
  • Full staking + action suite: 472 passing.

…t owner

PoP at candidate update was binding to c.Owner — the *current* owner.
For post-Xingu candidates this drifts after CandidateTransferOwnership:
the new owner has to know they're the new owner at signing time, and a
PoP signed under the old owner during a concurrent transfer is
rejected purely on a value mismatch that has nothing to do with key
possession.

c.GetIdentifier() is the stable handle for this purpose:

- Post-Xingu non-collision (the common case): Identifier is set once
  at register time to the original owner address (generateCandidateID
  fast-paths to owner when it's free). So a PoP signed at register
  binding to owner == identifier, and a PoP signed at update binding
  to c.GetIdentifier() lines up with that same value, even if owner
  has since transferred.
- Post-Xingu collision (edge case): Identifier is a hash-derived
  address. Register still binds to act.OwnerAddress() (signer can't
  predict the hash), updates use c.GetIdentifier(). The register-time
  PoP doesn't get re-verified later, so this asymmetry is harmless;
  inside the candidate's lifetime, all update PoPs use the same hash
  consistently.
- Pre-Xingu: c.Identifier is nil and c.GetIdentifier() falls back to
  c.Owner — behavior identical to the prior code.

Renamed the function parameter from ownerAddress to candidateID
throughout (BLSPopSigningRoot, SignBLSPop, VerifyBLSPop) and updated
the docstring to spell out what to pass at register vs update. The
on-the-wire bytes are unchanged — only the conceptual binding shifts.

Tests: renamed TestBLSPop_RejectWrongOwner to RejectWrongCandidateID;
added TestBLSPop_StableAcrossOwnershipTransfer locking in the property
that a PoP signed under the original owner still verifies under the
same identifier after the candidate is transferred to a new owner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed `3fbaa0dcd` — bind update-path PoP to `c.GetIdentifier()` instead of `c.Owner`.

Motivation

`c.Owner` shifts when ownership is transferred via `CandidateTransferOwnership`; `c.GetIdentifier()` is the candidate's stable handle. The previous owner-binding meant:

  • a new owner doing BLS rotation right after receiving a transfer had to sign with the new owner address — not wrong, but conceptually awkward
  • mid-flight concurrent transfers (`Alice → Bob → Carol` while Bob's update tx is in mempool) would invalidate Bob's PoP for a non-cryptographic reason

Why this isn't a big change

`generateCandidateID` returns the owner address directly when free, only hash-deriving on collision. For the common case identifier == registration-time owner, so:

  • register: PoP binds to `act.OwnerAddress()` (the future identifier in the non-collision case). Unchanged from before.
  • update: PoP binds to `c.GetIdentifier()`:
    • post-Xingu non-collision: same value as the register-time owner — register and update PoPs use identical bindings
    • post-Xingu collision (edge case): identifier is a hash; updates consistently use it across the candidate's lifetime
    • pre-Xingu: `GetIdentifier()` falls back to `c.Owner` — identical to the previous behavior

The on-the-wire PoP bytes don't change. Only the conceptual binding shifts from "current owner" to "candidate identity."

Implementation

  • `bls_pop.go`: renamed the parameter from `ownerAddress` to `candidateID` throughout (`BLSPopSigningRoot`, `SignBLSPop`, `VerifyBLSPop`); docstring spells out what to pass at register vs update
  • `handlers.go`: `handleCandidateUpdate` and `handleCandidateUpdateByOperator` switched from `c.Owner` to `c.GetIdentifier()` for the PoP binding (uniqueness check already used `c.GetIdentifier()`)
  • `bls_pop_test.go`: `TestBLSPop_RejectWrongOwner` → `TestBLSPop_RejectWrongCandidateID`; added `TestBLSPop_StableAcrossOwnershipTransfer` locking in the stability property

Test plan

  • 6 PoP tests pass including the new transfer-stability case
  • No on-the-wire changes — pre-fork replay unaffected

// which case the check rejects on any match. Used to enforce one
// BLS pubkey per delegate, a hard requirement for IIP-52's
// FastAggregateVerify quorum-counting model.
ContainsBLSPubKey(blsPubKey []byte, except address.Address) bool

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ContainsBLSPubKey should not need the except parameter.

Comment thread action/protocol/staking/bls_pop.go Outdated
h.Write([]byte(blsPopDomain))
h.Write(blsPubKey)
if candidateID != nil {
h.Write(candidateID.Bytes())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil if candididateID == nil ?

Comment thread action/protocol/staking/bls_pop.go Outdated
// for pre-Xingu records. This means a candidate that has been
// transferred to a new owner still uses its original identity for
// PoP, so the binding is stable across ownership transfers.
func BLSPopSigningRoot(blsPubKey []byte, candidateID address.Address) []byte {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blsPubKey could be deleted. It will be needed during verification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed `fbfd7ae92` — dropped `blsPubKey` from the signing root per our discussion.

The signing root is now `H(blsPopDomain || candidateID)`. Security model:

  1. Basic rogue-key registration: blocked by the pairing verifier alone. `Verify(pk_rogue, msg, sig)` requires `sig` to be produced under `sk_rogue`, which the attacker doesn't have — regardless of what's in `msg`.
  2. Same-message aggregation (the classical attack that pubkey-in-message defends against): requires two distinct honest signers to sign the same root. Protocol-enforced uniqueness of candidate identifiers (generateCandidateID + ContainsName/Operator/Owner checks at register time) rules that out — no two honest delegates ever produce PoPs over the same root.
  3. Cross-candidate replay: still blocked by candidateID binding.
  4. Cross-domain replay: still blocked by blsPopDomain.

`SignBLSPop` no longer derives `pk` from `sk` since the signing root doesn't need it. PoP bytes are still 96 B G2-compressed; DST unchanged.

Tests updated: `TestBLSPop_RogueKeyAttackBlocked` + `TestBLSPop_RejectNilCandidateID` use the new `BLSPopSigningRoot(candidateID)` signature.

// the candidate identifier); at update time pass the candidate's
// existing identifier (c.GetIdentifier()).
func SignBLSPop(sk *crypto.BLS12381PrivateKey, candidateID address.Address) ([]byte, error) {
if sk == nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or return error if candidateID == nil

// which case the check rejects on any match. Used to enforce one
// BLS pubkey per delegate, a hard requirement for IIP-52's
// FastAggregateVerify quorum-counting model.
ContainsBLSPubKey(blsPubKey []byte, except address.Address) bool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLSPubKeyOwner to return the owner address.

PoP coverage in the Go action layer (envestcc/iotex-core#4854) wasn't
visible from the web3 path: candidateRegisterWithBLS /
candidateUpdateWithBLS ABI entries have no blsPop slot, so any tx
routed through eth_sendRawTransaction and decoded via
NewCandidateRegister/UpdateFromABIBinary silently dropped the field.
Post-fork that would have produced txs the handler always rejects.

Adds two new V2 ABI methods rather than mutating the existing
selectors:

- candidateRegisterWithBLSAndPoP  (8 fields + blsPop + data)
- candidateUpdateWithBLSAndPoP    (4 fields + blsPubKey + blsPop)

Why V2 instead of extending the existing methods: changing a
parameter list changes the 4-byte function selector, breaking any
tooling that hardcoded the old ABI. The legacy WithBLS entries stay
working pre-fork (their handler path is unchanged), and post-fork they
reject naturally for lacking PoP — coexistence with no selector churn.

EthData routing picks the entry by data carried on the action:

- WithBLS && len(blsPop) > 0  → V2 selector, calldata includes PoP
- WithBLS                      → legacy selector, no PoP slot
- otherwise                    → non-BLS legacy candidateRegister

FromABIBinary recognises the V2 selector and decodes blsPop. Tests:
extended TestCandidateRegisterABIEncodeAndDecode with a "with public
key and PoP" subcase, and added TestCandidateUpdate / "ABI encode with
PoP" — both confirm the codec round-trips a non-empty PoP through
Pack + Unpack, locking in the property whose absence was the bug.

Out of scope for this commit: e2e coverage that exercises the web3
path through to a post-fork handler (proves the new selector actually
flows blsPop into Candidate.BLSPubKey assignment). Will follow once
the EnforceBLSPoP gate gets switched on in an integration scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed `ab96bbc4b` — closes the web3-path gap you flagged.

What was broken

The PoP changes earlier in this PR live in the Go action layer + proto + handler. The ABI layer (which web3 callers actually use via `eth_sendRawTransaction`) had no `blsPop` slot:

  • `native_staking_contract_abi.json` defined `candidateRegisterWithBLS` with 8 fields (no PoP) and `candidateUpdateWithBLS` with 4 fields (no PoP).
  • `CandidateRegister.EthData()` / `CandidateUpdate.EthData()` packed only those 8 / 4 fields — `cr.blsPop` was never encoded into calldata.
  • `NewCandidateRegister/UpdateFromABIBinary` unpacked only those 8 / 4 fields — even if a client somehow embedded PoP bytes, the decoder would have ignored them.
  • e2etest `SignedCandidateRegisterWithBLS` / `SignedCandidateUpdateWithBLS` were called with `blsPop = nil` everywhere, so the missing codec coverage went undetected.

Post-fork impact would have been: every web3-routed register/update with BLS — wallets, SDKs, `ioctl` flows going through eth-tx — gets rejected by the handler for missing PoP, with no way to provide it from the client side.

What changed

Added two new V2 ABI entries (your option B):

Method Inputs
`candidateRegisterWithBLSAndPoP` name, operatorAddress, rewardAddress, ownerAddress, duration, autoStake, blsPubKey, blsPop, data
`candidateUpdateWithBLSAndPoP` name, operatorAddress, rewardAddress, blsPubKey, blsPop

Coexistence model:

  • Legacy `candidateRegisterWithBLS` / `candidateUpdateWithBLS` keep their selector IDs. Tooling that hardcodes them keeps working pre-fork.
  • Post-fork the legacy entries are still accepted at ABI decode time, but the handler rejects them for lacking PoP — clients must switch to the V2 selector to register.

`EthData()` routes by content:

  • `WithBLS() && len(blsPop) > 0` → V2 selector
  • `WithBLS()` → legacy selector (pre-fork OK, post-fork rejected at handler)
  • otherwise → non-BLS `candidateRegister`

`FromABIBinary` adds the V2 selector recognition + decodes `blsPop`.

Tests

  • `TestCandidateRegisterABIEncodeAndDecode` gets a new "with public key and PoP" subcase that round-trips a non-empty PoP through Pack/Unpack via the V2 method.
  • `TestCandidateUpdate` gets "ABI encode with PoP" doing the same on the update side.

Both fail without the EthData/FromABIBinary changes — they're regression guards for exactly the bug you spotted.

Still out of scope

End-to-end web3 → handler test (proves the V2 selector flows blsPop all the way into `Candidate.BLSPubKey` post-fork, and that legacy selector + post-fork rejects properly). Will follow once we wire up the EnforceBLSPoP gate in a proper integration scenario — that's the same fork-activation work as the `ioctl` PoP-generation flag.

…ning

The previous commit added candidateRegisterWithBLSAndPoP /
candidateUpdateWithBLSAndPoP entries to native_staking_contract_abi.json
but left native_staking_contract_interface.sol — the human-readable
source-of-truth that clients / docs consume — out of sync.

Adds the matching Solidity declarations and a header comment in both
the .sol file and native_staking_contract_abi.go pointing out the
non-obvious sharp edge: there is no Makefile target that regenerates
the JSON from the .sol, so the two files must be kept in sync by hand
on every ABI change. Out-of-sync edits silently misencode the web3
path — clients embed one function signature while the node decodes
another.

No runtime behaviour change; the JSON has been authoritative all along.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Good catch — pushed `98bdaa969` mirroring the V2 PoP methods in `native_staking_contract_interface.sol` and adding a sync-warning header to both files.

Findings

  • `action/native_staking_contract_interface.sol` is the human-readable source-of-truth (clients / docs consume it)
  • `action/native_staking_contract_abi.json` is what iotex-core actually loads via `NativeStakingContractABI()` (`go:embed` + parse at init)
  • Makefile has no target regenerating one from the other — confirmed by grepping for `solc` / `abigen` / staking-related targets. The two are kept in sync by hand.

What's in the commit

  1. `candidateRegisterWithBLSAndPoP` and `candidateUpdateWithBLSAndPoP` Solidity declarations added to the .sol interface, mirroring the JSON I added in `ab96bbc4b`.
  2. Header comment at the top of `native_staking_contract_interface.sol` calling out the no-auto-gen invariant.
  3. Header comment in `native_staking_contract_abi.go` (above the `//go:embed` directive) saying the same thing — so anyone touching the JSON via Go-tooling code sees it.

The warning text is the same in both: "Adding a new method, parameter, or event requires editing BOTH files by hand, otherwise the web3 path silently misencodes — clients embed one signature while the node decodes another. There is no Makefile target that catches the drift."

On the lack of a Makefile target

It would be cleaner to wire one up (`solc --abi` → generated JSON, fail CI on drift). Out of scope for this PR — would need a solc version pin, an addition to the dev-deps target, and a CI step. Worth filing as a follow-up if the team wants. For now the comment is the cheapest guard.

Addresses two review threads on PR iotexproject#4854:

## Comments 1 + 5: drop the `except` parameter

Both envestcc and CoderZhi flagged that ContainsBLSPubKey(pubkey,
except) bool pushed the "is this me?" decision into a generic helper
that doesn't belong there.

Replaces ContainsBLSPubKey(pubkey, except) bool with the broader
GetByBLSPubKey(pubkey) *Candidate: callers receive the (possibly nil)
holder and compare identifiers themselves. The three register / update
handler call sites now read

    if holder := csm.GetByBLSPubKey(act.BLSPubKey()); holder != nil &&
        holder.GetIdentifier().String() != c.GetIdentifier().String() {
        return ErrCandidateConflict
    }

which is more explicit than the prior `except`-hiding API. It also
unifies with the same GetByBLSPubKey method added in iotexproject#4857 (Y4a) so
the two PRs don't introduce parallel BLS-pubkey-lookup APIs.

## Comments 2 + 4: strict nil-candidateID contract

Both BLSPopSigningRoot and the surrounding Sign / Verify helpers used
to silently accept a nil candidateID by skipping the candidate-binding
write. That degrades the scheme to domain+pubkey-only — exactly the
shape an attacker reaching for a cross-candidate replay would hope
for.

The three entry points now refuse:

  - BLSPopSigningRoot returns nil
  - SignBLSPop returns error("nil candidate ID; PoP must bind to a
    candidate identity")
  - VerifyBLSPop rejects before any cryptographic work

TestBLSPop_RejectNilCandidateID locks the contract in place.

## Not in this commit

Comment 3 ("blsPubKey can be removed from the signing root") —
deferred. Will reply on the thread; the short answer is that the
IRTF BLS draft defines canonical PoP as sign(sk, pk) and dropping
blsPubKey from the digest opens up same-message aggregation when
owner-uniqueness is ever relaxed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed `9229438ac` addressing 4 of the 5 review comments. The 5th (Comment 3 about `blsPubKey` in the signing root) I'm leaving alone and will reply on the thread separately.

What changed

Comments 1 + 5 → `GetByBLSPubKey` replaces `ContainsBLSPubKey`

`ContainsBLSPubKey(pubkey, except) bool` hid the "is this me?" decision in a helper. Replaced with `GetByBLSPubKey(pubkey) *Candidate` — callers get the (possibly nil) holder and compare identifiers themselves. The three register / update handler call sites now read:

```go
if holder := csm.GetByBLSPubKey(act.BLSPubKey()); holder != nil &&
holder.GetIdentifier().String() != c.GetIdentifier().String() {
return ErrCandidateConflict
}
```

Two side benefits:

Comments 2 + 4 → strict nil-candidateID

`BLSPopSigningRoot` / `SignBLSPop` / `VerifyBLSPop` used to silently accept nil candidateID by skipping the candidate-binding write — which degrades the scheme to domain+pubkey-only and is exactly the shape an attacker reaching for cross-candidate replay would hope for. The three entry points now refuse:

  • `BLSPopSigningRoot` returns nil
  • `SignBLSPop` returns `error("nil candidate ID; PoP must bind to a candidate identity")`
  • `VerifyBLSPop` rejects before any cryptographic work

`TestBLSPop_RejectNilCandidateID` locks the contract.

Test plan

  • `TestBLSPop_RejectNilCandidateID` covers all three entry points
  • `TestCandidateCenter_GetByBLSPubKey` covers the new lookup (nil / empty / unregistered / hit)
  • `go test ./action/protocol/staking/ ./action/` — 476 passing
  • `go build ./...` clean

Addresses Comment 3 on PR iotexproject#4854 (CoderZhi). The previous signing root
included blsPubKey:

    H(blsPopDomain || blsPubKey || candidateID)

per the IRTF BLS canonical PoP form. After discussion, switching to:

    H(blsPopDomain || candidateID)

The pairing verifier Verify(PK, msg, sig) already commits PK into the
signature equation, so the basic rogue-key registration ("register
pk_rogue without owning its discrete log") is blocked by basic PoP
correctness without needing pubkey-in-message. Same-message
aggregation — the classical attack that pubkey-in-message defends
against — requires two distinct honest signers to sign the same
signing root; the protocol-enforced uniqueness of candidate
identifiers rules that out. Cross-candidate replay is still blocked
by the candidateID binding, and cross-domain replay by blsPopDomain.

The simpler signing root keeps the on-the-wire calldata independent
of pubkey-length/encoding choices and removes an apparent redundancy
between the signed message and the verifier's pk argument.

SignBLSPop no longer derives pk from sk; the only caller-provided
binding is candidateID. The bls_pop bytes themselves are unchanged in
length and DST.

Updated TestBLSPop_RogueKeyAttackBlocked and TestBLSPop_RejectNilCandidateID
to use the new BLSPopSigningRoot(candidateID) signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc envestcc marked this pull request as ready for review June 18, 2026 01:03
@envestcc envestcc requested a review from a team as a code owner June 18, 2026 01:03
…-pop

Wires the three-option model agreed on PR iotexproject#4854: how does a delegate
actually supply the BLS proof-of-possession the post-fork handler now
requires?

## stake2 register / update

BREAKING positional change: BLS_PUBKEY moves out of the positional
arg list. Pre-fork callers passing it positionally must migrate; the
post-fork handler requires PoP anyway so this command was always going
to need re-tooling.

Three-option matrix lives in blspop_helper.go and applies to both
register (mandatory BLS) and update (optional BLS, default = don't
touch):

  Option 1 — auto-derive from signer (default for register):
    No --bls-* flags. The tool decrypts the signer keystore, derives a
    BLS key from the ECDSA scalar (same algorithm as `ioctl account
    blskey` — crypto.GenerateBLS12381PrivateKey(ecdsa_sk.Bytes())),
    computes the PoP, and prompts the user to confirm. The prompt is
    explicit about ECDSA↔BLS coupling so delegates aren't surprised
    when an ECDSA leak also exposes their producer identity. --yes
    suppresses for CI.

  Option 2 — --bls-priv-key HEX:
    Standalone BLS key supplied directly. Tool derives the pubkey and
    signs the PoP. No prompt.

  Option 3 — --bls-pubkey HEX + --bls-pop HEX:
    User supplies both. Tool VerifyBLSPop's locally against the
    candidateID before submission so a malformed PoP fails fast
    without burning gas. For register the candidateID is the owner
    address from positional args; for update it must be passed via
    --candidate-id.

  Option 0 (update only): no BLS flags at all → BLS untouched.
    Going from "leave BLS alone" to "rotate to derived key" requires
    explicit --bls-from-signer so a name/operator update can never
    silently rotate the producer key.

Mutual-exclusion + completeness rules are centralised in
blsPoPFlags.classifyForRegister / classifyForUpdate and locked in by
TestBlsPoPFlags_ClassifyForRegister / ClassifyForUpdate.

--bls-keystore is a placeholder; the iotex BLS keystore format isn't
specified yet. Tool returns an explicit error pointing at --bls-priv-key.

## account bls-sign-pop

New offline PoP signer for air-gapped workflows. Accepts either
--bls-priv-key HEX or --signer ADDRESS (ECDSA keystore → derive), plus
--candidate-id ADDRESS. Writes the 96-byte PoP hex to stdout (informational
fields go to stderr so shell redirection captures only the PoP). The
delegate signs on the cold machine, transfers the hex over USB / QR, and
runs `ioctl stake2 register --bls-pubkey ... --bls-pop ...` on a hot
machine — the BLS private key never leaves cold storage.

Lives in the account package next to the existing `account blskey` so
BLS key operations are co-located rather than scattered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed `b80f0f316` — ioctl UX for the BLS PoP path.

What lands

`ioctl stake2 register`

Three-option matrix:

  • Option 1 (default) — no `--bls-*` flags: tool derives BLS from the signer's ECDSA key, computes PoP, prompts the user to confirm the derived pubkey + the ECDSA↔BLS coupling warning. `--yes` for CI.
  • Option 2 — `--bls-priv-key HEX`: standalone BLS key.
  • Option 3 — `--bls-pubkey HEX --bls-pop HEX`: explicit, local `VerifyBLSPop` before submission.

BREAKING: `BLS_PUBKEY` moves out of positional args (was `args[4]`) into flags. Pre-fork pre-existing scripts need updating, but post-fork they need re-tooling anyway since they must supply PoP.

`ioctl stake2 update`

Same three options PLUS Option 0:

  • Option 0 — no BLS flags: BLS untouched (the common case when changing only name / operator / reward).
  • Option 1 for update requires explicit `--bls-from-signer` opt-in so a name-only update never silently rotates the producer key.
  • Update path needs `--candidate-id` for any non-Option-0 mode (RPC fallback is a future improvement).

`ioctl account bls-sign-pop` (new)

Offline PoP generator. Takes `--bls-priv-key HEX` OR `--signer ADDRESS` (the latter decrypts the ECDSA keystore + derives BLS), plus `--candidate-id ADDRESS`. Writes 96-byte PoP hex to stdout (informational fields to stderr so `> pop.hex` captures just the PoP). Air-gap flow:

```bash

cold machine

ioctl account bls-sign-pop --candidate-id io1owner... --bls-priv-key 0x... > pop.hex

USB / QR to hot machine

ioctl stake2 register mycand io1op io1reward io1owner 10000 350 \
--bls-pubkey 0x... --bls-pop $(cat pop.hex) \
--signer io1owner --auto-stake
```

Why hot/cold separation matters here

BLS sk derived from ECDSA sk means a single compromise blast both. The `--yes`-able prompt makes the coupling explicit so delegates choosing the easy path know what they signed up for; security-conscious delegates use Option 2 / 3 with separately-managed BLS material.

Tests

  • `TestBlsPoPFlags_ClassifyForRegister` — 6 cases: auto-derive default, explicit key, explicit PoP, partial input rejection, mutual-exclusion rejection
  • `TestBlsPoPFlags_ClassifyForUpdate` — 8 cases: Option 0 + each of 1/2/3 + reject branches including `--bls-from-signer + --bls-priv-key` mutual exclusion
  • `TestLoadBLSPrivKey` — hex parse: good / 0x-prefixed / empty / non-hex / wrong-length

Deferred

  • `--bls-keystore PATH` — placeholder, errors out pointing at `--bls-priv-key`. iotex doesn't have a BLS keystore format yet; separate RFC.
  • RPC fallback to resolve `--candidate-id` from the signer on update. Requires a gRPC roundtrip; for v1 the user supplies it explicitly.
  • e2e test exercising the whole chain (decrypt → derive → PoP → V2 ABI → handler accept) — depends on `EnforceBLSPoP` activation fixture.

Out of scope (separate PR territory)

  • Migration command for existing pre-PoP BLS registrations (just re-run `stake2 update --bls-from-signer` — no new command needed).
  • JS / Go SDK exposure of the same helpers — that lands in the respective SDK repos.

🤖 Generated with Claude Code

PR iotexproject#4854 had no handler-level integration coverage of the
EnforceBLSPoP gate — only the underlying SignBLSPop / VerifyBLSPop
unit tests in bls_pop_test.go — and no e2etest of the post-fork
register path at all. Reviewers couldn't confirm the three handler
sites actually fail-closed under the gate without running the full
chain manually.

## Handler integration (action/protocol/staking/handlers_blspop_test.go)

Three new test functions, 11 subcases:

- TestHandleCandidateRegister_PoPGate
  - gate on + valid PoP → ReceiptStatus_Success, c.BLSPubKey persisted
  - gate on + empty PoP → ErrUnauthorizedOperator
  - gate on + PoP signed under wrong key → ErrUnauthorizedOperator
  - gate off + empty PoP → Success (pre-fork behaviour preserved)

- TestHandleCandidateRegister_BLSPubKeyUniqueness
  - second candidate registering the same BLS pubkey with a valid
    PoP under its own owner → ErrCandidateConflict; the
    GetByBLSPubKey uniqueness check protects IIP-52's quorum-counting
    invariant against the "shared keypair" silent break

- TestHandleCandidateUpdate_PoPGate
  - gate on + valid PoP under c.GetIdentifier() → rotation succeeds,
    new BLSPubKey in state
  - gate on + empty PoP → ErrUnauthorizedOperator
  - gate on + PoP signed under wrong candidateID → ErrUnauthorizedOperator
  - gate off + empty PoP → rotation succeeds (pre-fork compat)

The fixture flips EnforceBLSPoP via genesis.ToBeEnabledBlockHeight
(0 / math.MaxUint64) and forces XinguBlockHeight = 0 so the BLS
registration codepath is reachable at the test block.

## e2etest (e2etest/native_staking_test.go)

TestCandidateBLSPoP exercises the full chain pipeline — encode →
submit → mint → handler → receipt → state — with four post-fork
subcases plus a pre-fork backward-compat anchor:

  1. Pre-fork register without PoP succeeds
  2. Post-fork register with valid PoP succeeds, BlsPubKey persists
  3. Post-fork register without PoP returns ErrUnauthorizedOperator
  4. Post-fork update rotates BLS pubkey with valid PoP
  5. Post-fork update without PoP returns ErrUnauthorizedOperator
     and leaves the previously rotated pubkey untouched in state

Genesis wires ToBeEnabledBlockHeight == XinguBlockHeight so the BLS
register path and the PoP gate activate together — the same shape
the planned fork rollout will use.

Pre-existing TestProtocol_FetchBucketAndValidate flake (master and
this branch both flap ~1/3) is unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@envestcc

Copy link
Copy Markdown
Member Author

Pushed `143010dfe` — handler integration + e2etest coverage for the PoP gate.

Handler integration (`handlers_blspop_test.go`, new)

3 functions, 11 subcases covering EnforceBLSPoP at all three handler sites:

TestHandleCandidateRegister_PoPGate — gate on/off × {valid PoP, empty PoP, PoP signed by wrong key}. The "wrong key" subcase exercises the rogue-key blocking via the pairing verifier, separate from the unit-test-level proof in `bls_pop_test.go`.

TestHandleCandidateRegister_BLSPubKeyUniqueness — second candidate trying to register the same BLS pubkey with a fresh valid PoP under its own owner → ErrCandidateConflict. Locks the GetByBLSPubKey invariant in at the handler boundary.

TestHandleCandidateUpdate_PoPGate — rotation under gate on/off × {valid PoP, empty PoP, PoP for wrong candidateID}. The "wrong candidateID" subcase confirms the cross-candidate replay defence (the binding lives in BLSPopSigningRoot, not at the protocol layer above).

Fixture flips the gate via `genesis.ToBeEnabledBlockHeight` (0 vs `math.MaxUint64`) and forces XinguBlockHeight = 0 so the BLS register/update codepath is reachable.

e2etest (`native_staking_test.go`)

TestCandidateBLSPoP — full chain pipeline encode → submit → mint → handler → receipt → state, with 5 subcases:

  1. Pre-fork register without PoP → Success (backward compat anchor)
  2. Post-fork register with valid PoP → Success, `Candidate.BlsPubKey` persisted in state
  3. Post-fork register without PoP → ErrUnauthorizedOperator
  4. Post-fork update rotates BLS pubkey with valid PoP → `Candidate.BlsPubKey` reflects the new key
  5. Post-fork update without PoP → ErrUnauthorizedOperator, previous BLS pubkey untouched in state

Genesis wires `ToBeEnabledBlockHeight == XinguBlockHeight` so the BLS register path and the PoP gate activate together — same shape as the planned fork rollout. Each test asserts both the receipt status AND the resulting candidate state, so a silent state mutation under a failure receipt wouldn't slip past.

Pre-existing flake

`TestProtocol_FetchBucketAndValidate/validate_owner_and_selfstake` flakes about 1 in 3 runs on this branch — and confirmed earlier it also flakes on master. Not introduced by this PR.

Test plan

  • `go test -count=1 -run "TestHandleCandidateRegister_PoPGate|TestHandleCandidateRegister_BLSPubKeyUniqueness|TestHandleCandidateUpdate_PoPGate" ./action/protocol/staking/` — 11/11 passing
  • `go test -count=1 -run "TestCandidateBLSPoP" ./e2etest/` — 5/5 passing
  • `go test -count=1 ./action/protocol/staking/ ./action/ ./ioctl/cmd/action/` — clean except for the pre-existing flake noted above
  • `go build ./...` clean

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants