fix(staking): harden markValidatorForDeletion (jailed-delete race + non-jailed divide-by-zero)#16
Open
vitaly-andr wants to merge 1 commit into
Conversation
markValidatorForDeletion mishandled validator removal in both branches: 1. Jailed branch deleted the validator record and the ValidatorByConsAddr index immediately. CometBFT keeps the validator in LastCommit for ValidatorUpdateDelay blocks (effective 2-block lag), and slashing.BeginBlocker -> GetValidatorByConsAddr then fails with ErrNoValidatorFound, halting the chain. 2. Non-jailed branch zeroed DelegatorShares alongside Tokens. TokensFromShares divides by DelegatorShares, so a MsgUnjail (or any other TokensFromShares caller) in the same block as the marking -- before DeleteZeroPowerValidators removes the record on the next block -- hits a divide-by-zero panic. Both paths now zero Tokens but keep DelegatorShares (and, for the jailed path, keep the record + index): power becomes zero, the existing DeleteZeroPowerValidators defer-path removes the record on a later block after CometBFT has dropped it, and TokensFromShares returns 0 cleanly. Adds TestMarkValidatorForDeletion_JailedRemoval_KeepsValidatorReachable and TestMarkValidatorForDeletion_NonJailedRemoval_KeepsDelegatorShares; both fail on current release/v0.53.x and pass with the fix. The non-jailed divide-by-zero was confirmed by @0xgonka (GON-191 author) on gonka-ai#14 as worth fixing in the same PR (identical fix shape). Surfaced via gonka-ai/gonka#982 (inference-chain simulation/fuzz testing); filed with a standalone reproducer as gonka-ai/gonka#1205. Signed-off-by: Vitaly Andrianov <vitaly.andr@gmail.com>
ad4a98b to
46ca57b
Compare
vitaly-andr
added a commit
to vitaly-andr/gonka
that referenced
this pull request
May 29, 2026
…zzing, secondary-op factories Completes gonka-ai#982 Phase 3 (x/inference hardening) on top of the second-wave ops: - Custom invariants via RegisterInvariants (bank-backs-positive-balance solvency, no-stuck-voting, effective-epoch-fresh, active-invalidations-ref-live) + unit tests. - Store decoders for the simulation decode harness (Participant / Inference / KeySet / Uint64 / Int64 / legacy-params, hex fallback) + tests. - Aggressive parameter-boundary fuzzing (MutateSimParams): economic params to Validate() boundaries, widened EpochParams, corner profiles. Slashing- activation levers held at defaults — active slashing drains the Tokens=1 sim substrate. - Three real secondary-op factories (SubmitUnitOfComputePriceProposal, SubmitHardwareDiff, SubmitNewUnfundedParticipant) replacing NoOp stubs. - Data-driven weight tuning; a model-weight transient-cache test helper. - docs/simulation.md: Phase-3 status, fuzzing design, invariant list, sweep. The full 500-block sim run is green only with the production fixes it surfaced: gonka-ai#1275 (VOTING-timeout refund), gonka-ai#1276 (revalidation membership guard), and gonka-ai/cosmos-sdk#16 (markValidatorForDeletion). Verified combined: sim-full 500x200 seed=99 PASS, app-hash f76df5cd…. The post-run bank-backs invariant holds in the canonical 500x200 run (transient intra-epoch under-backing heals via debt-recovery by settlement); the asymmetric refund-debit drift it surfaces under shorter/aggressive runs is documented as a design question in gonka-ai#1273 (TestRefundInvalidation_LeavesAccountingDrift). Signed-off-by: Vitaly Andrianov <vitaly.andr@gmail.com>
This was referenced May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
markValidatorForDeletionmishandled validator removal in both branches. Same function, same defect class (zeroing state in a way that breaks downstream consumers before the record is gone), so fixed together.1. Jailed branch — chain halt
The jailed branch called
deleteValidatorInternal, removing the validator record and theValidatorByConsAddrindex immediately. CometBFT keeps the validator inLastCommitforValidatorUpdateDelayblocks (effective 2-block lag —state/execution.gonext-next-height plus theNextValidatorsdouble-buffer). During that windowslashing.BeginBlocker → IsValidatorJailed → GetValidatorByConsAddrfails withErrNoValidatorFound, halting the chain.2. Non-jailed branch — divide-by-zero panic
The non-jailed branch zeroed
DelegatorSharesalongsideTokens.TokensFromShares(validator.go:308) computesshares × Tokens / DelegatorShares, so aMsgUnjail(or any otherTokensFromSharescaller) landing in the same block as the marking — beforeDeleteZeroPowerValidatorsremoves the record on the next block — hits a divide-by-zero panic.Fix
Both paths now zero
Tokensbut keepDelegatorShares(and, for the jailed path, keep the record + index). Power becomes zero, the existingDeleteZeroPowerValidatorsdefer-path removes the record on a later block — afterLastValidatorPowerclears and CometBFT has dropped the validator — andTokensFromSharesreturns 0 cleanly.Tests
TestMarkValidatorForDeletion_JailedRemoval_KeepsValidatorReachableTestMarkValidatorForDeletion_NonJailedRemoval_KeepsDelegatorSharesBoth fail on current
release/v0.53.xand pass with the fix.Context
Surfaced while implementing simulation/fuzz testing for inference-chain (gonka-ai/gonka#982); filed with a standalone reproducer as gonka-ai/gonka#1205.
Independent of and complementary to GON-191 (#14, @0xgonka) — a different function (
markValidatorForDeletion) than GON-191's (filterBasedOnExisting/deleteValidatorInternal). The two were verified to compose without conflict (full simulation pass with both patches, height=501); coordination on #14: #14 (comment). In that thread @0xgonka confirmed the non-jailed divide-by-zero (#2 above) and recommended fixing it in this PR (identical fix shape).cc @gmorgachev — fork maintainer and author of
markValidatorForDeletion(both branches changed here: the jailed delete incefc31d3a8e, the non-jailedDelegatorShareszeroing in0d28c0a1e2e)