fix(ci): stop fuzz jobs from oversubscribing the CPU#21517
Merged
Conversation
Each fuzz recipe runs N targets concurrently via `parallel`, and each `go test -fuzz` defaults to GOMAXPROCS workers — N×N worker processes on an N-core CI box. A starved worker can miss the `-fuzztime` deadline and fail the target with a bare `context deadline exceeded` (not a real crash). Pass `-parallel=1` so each target uses a single fuzz worker, leaving the only parallelism across targets. Fixed in the shared `go_fuzz` helper (op-node, op-batcher, op-chain-ops, op-service, op-challenger) and the two bespoke recipes (cannon, op-e2e). Duration `-fuzztime` budgets are unchanged, so CI wall-time is unaffected.
digorithm
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Go fuzz CI jobs (
cannon-fuzz,fuzz-golang-*,op-e2e-fuzz) intermittently fail with a barecontext deadline exceededat the-fuzztimeboundary — not a real crash. Three such failures on 2026-06-22 across two jobs (e.g.FuzzStateHintRead,FuzzEncodeDecodeWithdrawal); ~0 failures in the prior 745+ runs per job.Cause
Each fuzz recipe runs N targets concurrently via
parallel, and eachgo test -fuzzdefaults its worker count toGOMAXPROCS. On the 8-core CI box that's 8 × 8 = 64 fuzz worker processes. Under that oversubscription a worker can be starved of CPU and miss the fuzztime deadline, which the engine reports ascontext deadline exceeded.Fix
Pass
-parallel=1so each target uses a single fuzz worker — the only parallelism left is across targets (≈ cores, no oversubscription). Applied in the sharedgo_fuzzhelper (covers op-node, op-batcher, op-chain-ops, op-service, op-challenger) and the two bespoke recipes (cannon, op-e2e).The duration
-fuzztimebudgets are unchanged, so wall-time is unaffected (verified locally: identical wall-time, no throughput regression).This is a CI-scheduling flake reproducible only under contention, so there's no unit-test regression guard.
Closes #21516