Problem
After computing round polynomials and deriving the challenge, each sumcheck instance's bindChallenge runs as a separate dispatch:
parallelForForce(3, bindRegsVal) → barrier
parallelForForce(41, bindAllLookupTables) → barrier
parallelForForce(bytecode_d+1, bindBcRaf) → barrier
parallelForForce(N+1, bindBooleanity) → barrier
...
Each bind operation is independent across instances — they all use the same challenge value and operate on disjoint memory. Yet they serialize with barriers between them.
Proposed solution
Fuse all instance bindings into a single dispatch. Instead of N separate parallelForForce calls (one per instance), create a unified dispatch that distributes bind work across all instances:
// Single dispatch: each worker claims an (instance, array_index) pair
parallelForForce(total_bind_arrays, bindAnyArray) → 1 barrier
Where total_bind_arrays = sum of arrays across all instances (e.g., 3 + 41 + 5 + 4 + ... ≈ 60 arrays in Stage 5).
This gives better load balancing (60 work items across 8 threads vs multiple dispatches of 3-5 items each) and eliminates N-1 barrier cycles.
Files
src/zkvm/spartan/stage5_prover.zig — bind operations after each round
src/zkvm/spartan/stage6_prover.zig — bind operations after each round
Interaction
Combines with the "batch instance computation" issue to reduce per-round barriers from 8-9 to exactly 2 (one compute, one bind).
Problem
After computing round polynomials and deriving the challenge, each sumcheck instance's
bindChallengeruns as a separate dispatch:Each bind operation is independent across instances — they all use the same challenge value and operate on disjoint memory. Yet they serialize with barriers between them.
Proposed solution
Fuse all instance bindings into a single dispatch. Instead of N separate
parallelForForcecalls (one per instance), create a unified dispatch that distributes bind work across all instances:Where
total_bind_arrays= sum of arrays across all instances (e.g., 3 + 41 + 5 + 4 + ... ≈ 60 arrays in Stage 5).This gives better load balancing (60 work items across 8 threads vs multiple dispatches of 3-5 items each) and eliminates N-1 barrier cycles.
Files
src/zkvm/spartan/stage5_prover.zig— bind operations after each roundsrc/zkvm/spartan/stage6_prover.zig— bind operations after each roundInteraction
Combines with the "batch instance computation" issue to reduce per-round barriers from 8-9 to exactly 2 (one compute, one bind).