feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt)#75
Merged
Merged
Conversation
… fix
This change adds infrastructure for Jolt SDK ELFs and the jolt-inlines/sha2
SHA256 precompile, plus the bug fix that makes SDK fibonacci verify against
Jolt's verifier.
Why
- SDK ELFs (compiled with jolt-sdk) introduced new instruction patterns
(CSRRW/CSRRS/MRET in _start, custom 0x5B advice/host-IO opcodes,
JAL/JALR with rd=x0 remapped to virtual register 40, FENCE) that the
prior Zolt prover did not handle.
- Jolt-inline replaces crypto operations in guest programs with custom
RISC-V opcodes (0x0B); the bytecode preprocessing, lookup tables, and
trace builder all needed to recognize and expand these.
- The single most important bug for SDK ELF verification was a mismatch
between the bytecode val_polys (Stage 4 RegistersRWC) and the trace's
Rs1Ra polynomial: the trace step's `rs1_read` field was false for
custom 0x5B opcodes (VirtualHostIO/Advice*), but the bytecode entry
carried rs1=Some(0). The val_poly therefore added eq_table_4[0] for
these entries while the Stage 4 sumcheck did not, producing diverging
per-stage claims and Stage 6 sumcheck failure.
Fix
- src/tracer/mod.zig: in stepNormal, set `reads_rs1=true` for custom-2
opcode 0x5B with funct3 != 0/5 (the SDK Virtual{HostIO,Advice*} family).
funct3=0/5 are VirtualSRL/SRA which have their own step handlers that
set rs1_read/rs2_read explicitly, so we leave them alone.
- src/zkvm/proving_pipeline.zig: pass device.memory_layout.termination
through to the prover-internal BytecodePreprocessing so the synthetic
termination LUI/ADDI/SD entries match the exporter's. Defensive — does
not by itself cause Stage 6 failure on the test workloads but keeps
the two preps byte-for-byte identical.
Other infrastructure (used by jolt-inline and SDK)
- Decoder: handle opcode 0x0B (custom-0 jolt-inline), 0x5B (SDK custom-2
VirtualHostIO/AdviceLoad/AdviceLen), 0x6B (custom-3 VirtualROTRI/W),
CSRRW/CSRRS/MRET decoding, and JAL/JALR rd=x0 remapping to vr40.
- bytecode_preprocessing.zig: decompose CSRRW/CSRRS/MRET into
ADDI/OR/JALR sequences matching Jolt's tracer's inline_sequence
exactly (vr34..vr39 CSR registers, vr40 temp), and synthetic
termination JAL with rd=vr40.
- Lookup tables: implement VirtualROTR (idx 27) and VirtualROTRW (idx 28)
MLE evaluations and materialize entries.
- Lookups + lookup_trace: add Andn / VirtualROTRI / VirtualROTRIW types
and recorders.
- spartan/bytecode_entries.zig: add ANDN / VirtualROTRI / VirtualROTRIW
entry population, FENCE/ECALL/CSR/MRET handling in
populateEntryFromJoltInstruction, JAL rd=0→vr40 remapping in entry
population, IsLastInSequence post-processing for JALR vsr=0.
- spartan/stage5_*: add 0x6B and 0x5B handling in opcode dispatch.
- r1cs/constraints.zig + trace_witness.zig: 0x5B funct3 guards on the
compact integer R1CS path, 0x6B/0x0B/0x2B handling.
- tracer/sha256_inline.zig: NEW — SHA256 sequence builder ported from
jolt-inlines/sha2 (~500 lines), expands a 0x0B SHA256 instruction
into the ~550 virtual instructions Jolt produces.
- jolt-verifier: pin to a Jolt revision that matches the proof format.
Status
- Plain C ELFs (fibonacci, sha256, sha256_128, sha256_2048, collatz,
primes_large) verify against Jolt's verifier — unchanged.
- examples/fibonacci_sdk.elf VERIFIES against Jolt's verifier — was
previously failing at Stage 6 with SumcheckVerificationError.
- examples/sha256_inline.elf still panics in the prover at
evaluators.zig:1013 (constraint violation in Az/Bz product). This is a
pre-existing issue inside the inline SHA256 sequence; tracked as a
follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lRev8W The R1CS witness's `next_is_virtual` and `is_virtual_opcode` checks treated opcode 0x5B with funct3=0 or funct3=5 as always-virtual. In upstream Jolt's custom-2 opcode space, those funct3 values map to first-class ELF instructions (VirtualRev8W and AdviceLW respectively), not virtual sequences. The bogus "virtual" classification fired R1CS constraint 18 (MustStartSequenceFromBeginning) on every transition from a real ELF instruction into one of these custom-2 instructions, and the prover hit the constraint-satisfaction assertion in interpolateAzBzProductInt while building the witness for sha256_inline.elf. Fix: - src/zkvm/r1cs/trace_witness.zig: drop the 0x5B-funct3=0/5 branches in both `is_virtual_opcode` and `next_is_virtual`. The vsr-based check (`virtual_sequence_remaining > 0` or `is_last_in_sequence`) already catches internal expansions like our SHA256-inline VirtualSRLI sub-steps. Also add the start of VirtualRev8W support (the byte-swap-each-32-bit-half helper used by jolt-inlines/sha2's swap_bytes()): - jolt_instruction.zig + decoder: VirtualRev8W variant decoded from 0x5B funct3=0; AdviceLB/H/W/D variants for funct3=3..6; VirtualAdviceLoad as the inline-sequence target. - spartan/bytecode_entries.zig: populateEntryFromJoltInstruction handler for VirtualRev8W (lookup table 24, AddOperands+WriteLookupOutputToRD, LeftOperandIsRs1Value); SDK NoOp-like handler covers AdviceLB/H/W/D. - instruction/mod.zig: LookupTables.VirtualRev8W variant + materializeEntry. - instruction/lookups.zig: VirtualRev8WLookup type. - instruction/lookup_trace.zig: recordVirtualRev8W. - lookup_table/mod.zig: VirtualRev8W table struct + evaluateMLE; wired into evaluateTableMLE at index 24 (matches Jolt 997c1543's enum ordering). - tracer/mod.zig: stepVirtualRev8W single-cycle handler that computes rd = byte_swap_per_half(rs1) and emits the trace step. Status: - All 6 previously-passing ELFs (fibonacci, sha256, collatz, primes_large, sha256_128, fibonacci_sdk) still verify. - examples/sha256_inline.elf no longer panics in the prover; the prover now generates a complete proof. Verification still fails at Stage 2 because AdviceLW/AdviceLD inline-sequence expansion (VirtualAdviceLoad + SLLI + SRAI) is not yet implemented in the bytecode preprocessor or emulator. Tracked as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AdviceL{B,H,W,D} instructions (Jolt SDK custom-2 opcode 0x5B funct3=3..6) are
no longer treated as NoOps in the prover. They now expand to the same virtual
sub-sequence Jolt's tracer produces via inline_sequence():
AdviceLB/LH/LW (64-bit mode): VirtualAdvice → VirtualMULI (SLLI) → VirtualSRAI (SRAI)
AdviceLD: VirtualAdvice (single cycle, already 64-bit)
- src/tracer/mod.zig:
- Emulator gains an `advice_pos` cursor and `adviceTapeRead(num_bytes)` helper
that consumes bytes from `device.untrusted_advice`, returning 0 once the tape
is exhausted (valid witness — advice is externally supplied and only range-
checked by Jolt's proof system).
- New `stepAdviceLoadSignExt` emits the 3-cycle (or 1-cycle for LD) sequence
with correct vsr=2,1,0 and is_first=true,false,false flags, matching Jolt's
finalize() layout. Sign-extension is performed by the shift pair for the
sub-word loads.
- `step()` dispatches 0x5B funct3=3..6 to `stepAdviceLoadSignExt`.
- src/zkvm/bytecode_preprocessing.zig: decode-time expansion of .AdviceLB/LH/LW
into three JoltInstruction entries (VirtualAdvice, VirtualMULI, VirtualSRAI)
and .AdviceLD into one (VirtualAdvice), so the exported bytecode matches the
trace's virtual sub-sequence exactly.
- src/zkvm/proving_pipeline.zig: update `computeBytecodeCodeSizeWithTextSize`
to count 3 entries for AdviceLB/LH/LW and 1 for the other custom-2 variants,
so `bytecode_K` stays in sync with the expanded bytecode array.
Status:
- All 6 previously-passing ELFs still verify (fibonacci, sha256, collatz,
primes_large, sha256_128, fibonacci_sdk).
- examples/sha256_inline.elf still fails verification at Stage 2, but the
prover no longer panics. The remaining Stage 2 failure is in the combined
InstructionLookupsClaimReduction / SpartanProductVirtualization / Ram* batch
and needs further investigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Our internal VirtualSRLI (emitted by SRLI decomposition) uses opcode 0x5B funct3=0 as its synthetic trace-step encoding, which collides with Jolt's VirtualRev8W instruction that uses the same encoding at the ELF level. Make stepVirtualRev8W emit a synthetic trace instruction word with opcode 0x7B funct3=0 instead, so Stage 5's getLookupTableIndex and the is_identity_path classifier can distinguish it from VirtualSRLI without any bytecode-walking or opcode-overlapping tricks. The serialized bytecode JSON still uses the variant tag .VirtualRev8W so Jolt's verifier interprets it correctly. - src/tracer/mod.zig: stepVirtualRev8W builds a synthetic 0x7B funct3=0 instruction word with rd/rs1 set, rs2/imm=0. - src/zkvm/spartan/bytecode_entries.zig: - populateEntryFromJoltInstruction .VirtualRev8W sets entry.opcode=0x7B - getLookupTableIndex(0x7B, ...) → 24 (VirtualRev8WTable) - isKnownInstruction recognizes 0x7B - src/zkvm/spartan/stage5_instances.zig: getLookupTableIndex(0x7B) → 24, is_identity_path(0x7B) → true, identity-path lookup index = rs1. - src/zkvm/r1cs/trace_witness.zig: extractOperandFlags(0x7B) sets LeftOperandIsRs1Value + AddOperands + WriteLookupOutputToRD. decodeImmediateInt(0x7B) returns 0. computeU128LookupOperandInt(0x7B) returns rs1_value. readsRs1 includes 0x7B. All 6 previously-verifying ELFs still pass. sha256_inline still fails at Stage 2 — that failure is pre-existing and unrelated to this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dFlags
The compact R1CS witness path's extractOperandFlags had no case for opcode
0x6B (custom-3, VirtualROTRI/VirtualROTRIW), leaving the per-cycle flags all
false. This silently diverged from the bytecode entry's circuit/instruction
flags for any cycle that executed a VirtualROTRI/W virtual instruction — this
primarily matters for jolt-inline SHA256 which emits hundreds of VirtualROTRI
sub-steps via its inline expansion.
- src/zkvm/r1cs/trace_witness.zig:
- extractOperandFlags handles 0x6B with LeftOperandIsRs1Value +
RightOperandIsImm + WriteLookupOutputToRD (interleaved path, matching
VirtualROTRILookup.instructionFlags()).
- decodeImmediateInt reconstructs the bitmask from the rotation amount
stored in the I-type imm field, so Stage 1 val_poly sees the same bitmask
value the bytecode entry carries (populateVirtualROTRI stores the full
64-bit bitmask in entry.imm).
All 6 previously-verifying ELFs still pass. sha256_inline still fails at
Stage 2; the remaining failure is not in the 0x6B path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the ZOLT_DUMP_TRACE diagnostic added during sha256_inline investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Our stepInline handler for the jolt-inline SHA256 expansion set rd_written = (!is_store and rd != 0), excluding rd=0 from RdWa tracking. But Jolt's convention (documented in stepNormal) is that *every* instruction with an rd field writes to rd, even when rd=0 — Jolt captures cpu.x[0] before/after execution and treats the (0→0) transition as a write. Excluding rd=0 causes the Stage 4 Rs1Ra/RdWa sumcheck claims for the SHA256 sub-cycles to diverge from what Jolt's verifier reconstructs from the serialized bytecode entries. - src/tracer/mod.zig: stepInline trace step now sets rd_written=!is_store, matching stepNormal. All 6 previously-verifying ELFs still pass. sha256_inline still fails at Stage 2 (the 5-way batched sumcheck); the remaining failure is not in the rd_written path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent fixes that bring sha256_inline.elf much closer to verifying against Jolt's verifier (Stage 6 BCRAF per-stage claims now all match between prover and bytecode-derived val polys): 1. constraints.zig: hasLookupTable now returns true for opcode 0x7B (synthetic VirtualRev8W). Without this the trace witness silently skipped flag computation for VirtualRev8W cycles, leaving cf[0] AddOperands and cf[6] WriteLookupOutputToRD as zero while the bytecode entry had them set. 2. stage5_instances.zig: add 0x7B to left_is_rs1 / lookup-operand computation, mirroring VirtualZeroExtendWord (0x42), and split getLookupTableIndex's funct3==7 case so ANDN (funct7=0x20) routes to table 3 (AndnTable) instead of collapsing into table 2 (AND). The two were aliased before, causing LookupTableFlag[2,3] sums to diverge between the trace polynomial and the bytecode val polys. 3. bytecode_preprocessing.zig: in the SHA256 inline expansion path, convert the shamt stored in InlineInstr.imm into the multiplier (1 << shamt) before populating the VirtualMULI bytecode entry, matching Jolt's VirtualMULI::operands.imm convention and the standalone SLLI->VirtualMULI handler. Stage 2 verification still fails for sha256_inline (RWC and OutputSumcheck instances disagree), but ProductVirt/InstrLookups/RAF plus all five Stage 6 BCRAF stages now match.
The SHA256 inline expansion in stepInline was calling self.ram.read / self.ram.write directly, which bypassed the I/O-region check. When the host program passes an output-region pointer (e.g. 0x7fffb000) as the state pointer to the SHA256 inline opcode, the resulting hash bytes were written to plain RAM instead of being captured into device.outputs. That left val_final[output_index] != val_io[output_index] when the verifier reconstructs the IO polynomial via eval_io_mle, which is what breaks Stage 2's RamReadWriteChecking and OutputSumcheck for sha256_inline. Switch the inline SD path to writeWordWithIO and the inline LD path to readWordWithIO so that I/O-region accesses follow the same code path as regular SDs/LDs.
The SB/SH/SW decomposition in stepSubWordStore was bypassing the IO check by calling self.ram.write directly for the final SD step. That left device.outputs empty for any program that writes its result via sub-word stores (which is what the Rust SDK does for typical small return values), so: 1. The trace's val_final and the verifier's val_io disagreed at the output region for any program with non-zero outputs (which is why sha256_inline failed Stage 2's OutputSumcheck and RamRafEvaluation). 2. The Fiat-Shamir preamble couldn't see the actual program output. Fixes: - writeWordWithIO now routes writes to BOTH device.outputs (for the preamble + val_io) AND ram.write (so val_final and subsequent read-modify-write LD steps observe the same byte values). - stepSubWordStore's SB/SH and SW SD-finalization steps now go through writeWordWithIO instead of self.ram.write directly. - proveJoltCompatibleWithDoryAndSrsAtAddress trims trailing zero bytes off the outputs slice before passing it to the JoltDevice. Jolt's RV64IMACVerifier::new applies the same truncation (program_io.outputs.truncate(...)) before running its preamble; if the prover doesn't trim too, the prover/verifier transcripts diverge for any program whose output ends in zero bytes (e.g. fib_sdk computing fib(0)=0, which used to "verify" only because the pre-fix outputs were always empty). - JoltProofWithDory now carries program_inputs / program_outputs / program_panic so the prove command can write a `<proof>.io` ark-serialize sidecar containing the JoltDevice next to the proof. - jolt-verifier reads `<proof>.io` (when present) and uses the deserialized JoltDevice as program_io for verification, replacing the previous "always empty default" behaviour. Falls back to the old default when no sidecar exists, preserving compatibility. After this commit: - All previously verifying ELFs (fibonacci, fibonacci_sdk, alloc_sdk, bitwise, collatz, factorial, gcd, primes, signed, sum, sha256_128, sha256_1024, primes_large) still verify. - sha256_inline.elf now passes Stage 2's RAF, Output, Product, and InstructionLookups instances and the Stage 6 BCRAF claims; the only remaining mismatch is Stage 2 RamReadWriteChecking (inst[0]). That is the next thing to chase.
In the inline instruction sequence handler for SD, memory_pre_value was read from ram.memory AFTER writeWordWithIO had already stored the new doubleword, so the TraceStep recorded the POST value instead of the pre-value. This corrupted the Stage 6 RamInc polynomial (ram_inc = memory_value - memory_pre_value = post - post = 0) and caused Stage 2 RamReadWriteChecking verification to fail for sha256_inline. Capture the pre-value before the write. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…EN inputs The verifier-style expected_output_claim path in InstructionReadRaf calls LookupTable.evaluateTableMLE for every table flag opening. UpperWord (table 13) was a panic, which made it impossible to drive the prover-side expected-claim diagnostic for any program that uses MULHU. Pow2 (table 21) is a single-operand identity-path table whose existing struct method asserts r.len == XLEN, but Jolt's evaluate_mle takes the full 2 * XLEN interleaved opening point. Add a length-aware dispatch that runs Jolt's product formula when r.len == 2*XLEN and falls back to the legacy single-operand variant otherwise. Also raise @setEvalBranchQuota in the few MLEs whose `inline for` over XLEN exceeds the comptime branch limit when called from runtime-dispatched paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VirtualSRA.evaluateMLE was using r[2*(XLEN-1)] as the sign bit and folding the sign-extension contribution inside the iteration loop. Jolt's evaluate_mle (jolt-core/src/zkvm/lookup_table/virtual_sra.rs) uses r[0] as the sign bit and accumulates a separate sign_extension term weighted by 2^i (skipping i == 0). Rewrite to match Jolt exactly. ShiftRightBitmask.evaluateMLE assumed a single-operand `r.len == XLEN` input and brute-forced the sum, while Jolt's evaluate_mle expects `r.len == 2 * XLEN` and reads the LAST log2(XLEN) elements as the encoded shift amount. Add a 2*XLEN path that mirrors Jolt and keep the legacy path for the 8-bit test harness. These functions are currently only reachable via the verifier-style expected-claim diagnostic path (evaluateTableMLE), but were silently producing wrong values any time the diagnostic was run on a program whose Stage 5 cycles exercised these tables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The local `rev8w` helper in src/zkvm/lookup_table/prefixes.zig only processed the lower 32 bits of its u64 input — it masked off the upper half before byte-swapping. Jolt's tracer::instruction::virtual_rev8w::rev8w reverses bytes within EACH 32-bit half independently and concatenates them back into a u64. This helper is consumed by the Rev8W prefix MLE and its checkpoint update, so any prefix evaluation involving bits in the upper half of the lookup index produced a value different from Jolt's, causing Stage 5 InstructionReadRaf to disagree with the verifier for any program that exercises VirtualRev8WTable (table 24). sha256_inline triggers this because the inline SHA256 expansion emits VirtualRev8W cycles for endian conversion. Match Jolt's full-u64 implementation: (v as u32).swap_bytes() as u64 + (((v >> 32) as u32).swap_bytes() as u64) << 32 Verified via the existing REMAT_VS_MLE diagnostic: with this fix, stored_table_values[24] computed via prefix/suffix decomposition now agrees with the direct MLE evaluation at r_address for both fib_sdk (unused → was canceling) and sha256_inline (used → was bug). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…MLEs Jolt's `rightShiftWPrefixMle` and `leftShiftWHelperPrefixMle` use Rust's `F::from_u32(1 << y.leading_ones())` and `F::from_u32(u32::from(x) >> y.trailing_zeros())`. In release mode, `u32` shift counts are masked mod 32, so `1u32 << 32` wraps to `1u32 << 0 = 1` (NOT 0) and `x >> 32` wraps to `x >> 0 = x`. Zolt previously had conservative `if (shift >= 64) return result` / `else 0` branches which zero out the result at those boundaries — diverging from Jolt's wrapping behavior when `y.leading_ones() == 32` (y is all-ones in its bit width) or `y.trailing_zeros() == 32` (y == 0). This alignment doesn't fix sha256_inline's Stage 5 failure on its own, but it removes a known-divergence from the Jolt reference that could quietly corrupt prefix MLE evaluations for any trace that happens to hit these edge cases. All existing regressions (fibonacci_sdk, fibonacci, bitwise, alloc_sdk, collatz, sha256_1024) continue to verify. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gated behind ZOLT_S5_DEBUG=1 env var, dumps at the end of Stage 5 lookups: - current_claim (tracked through sumcheck round Horner updates) - combined_vals[0], lookups_current_scalar, E_in[0], E_out[0], ra_product - self_check = scalar * E_in[0] * E_out[0] * combined_vals[0] * Π ra_chunks If the polynomial state is internally consistent with the round polynomial chain, self_check must equal current_claim. For sha256_inline this invariant currently fails (self_check_eq_current=false), pinpointing a per-round computation bug somewhere in proverMsgReadChecking / proverMsgRaf — the next step for the Stage 5 InstructionReadRaf investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ord, VirtualROTRW
Adds Zig ports of Jolt's `prefix_suffix_test::<XLEN, F, T>()` (jolt-core
test.rs:49) for the three tables that sha256_inline uses and that fib_sdk /
sha256_1024 do not — these were the prime suspects for the Stage 5
InstructionReadRaf drift.
The test walks 50 random lookup indices through all 8 × 16 = 128 address
rounds, binding prefix bits incrementally, and at each round compares
combine(prefix_evals, suffix_evals) against the table MLE evaluated
at the reconstructed 128-element field point. A mismatch would pinpoint a
bug in either the prefix MLE, the suffix MLE, the combine formula, or the
prefix checkpoint updates.
Results:
- Andn (table 3): PASS
- LowerHalfWord (table 19): PASS
- VirtualROTRW (table 28): PASS (with bitmask-format index generator
matching Jolt's `gen_bitmask_lookup_index`; VirtualROTRW's MLE is only
well-defined when the right operand y is a contiguous-leading-ones
bitmask, so the generator restricts to that subset.)
This narrows the sha256_inline Stage 5 drift to components OTHER than
these three tables' address-round prefix-suffix decomposition. Remaining
suspects: RAF (left/right/identity prefix MLEs), rematerialization at
round 128, cycle-round polynomial computation, or a prefix checkpoint
update for a table that is mid-chain.
Also fixes a stale test in src/tracer/sha256_inline.zig that referenced
InstrKind.SRLI (removed in favor of VirtualSRLI) — this was blocking
`zig build test` entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ndHalfWord Adds the SignExtendHalfWord (table 20) prefix-suffix decomposition test alongside the Andn / LowerHalfWord / VirtualROTRW tests. sha256_inline hits this table via its VirtualSignExtendWord (0x0B) cycles inside the SRLIW decomposition. Test passes — ruling out this table as the source of the Stage 5 drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs the full address-rounds sumcheck at small scale (total_len=8, chunk_len=2, 4 phases × 2 rounds) for a single random cycle and verifies that: left_ps.bound_value == OperandPolynomial::Left::evaluate(r_challenges) right_ps.bound_value == OperandPolynomial::Right::evaluate(r_challenges) identity_ps.bound_value == IdentityPolynomial::evaluate(r_challenges) Covers 30 random trials. Passes, confirming that RafDecomposition's prefix_mle initialization + bind correctly tracks the three RAF polynomials through a complete phased sumcheck. This is weaker than the per-round test (it only checks the final bound value, not intermediate round polynomials), but it's a strong sanity check that the prefix cached-table approach is correct. Sha256_inline's Stage 5 drift must live in the round-polynomial computation (proverMsgRaf), the multi-cycle Q-poly aggregation, cycle rounds, or rematerialization — not in the single-cycle RAF bound-value chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r sha256_inline The SHA-256 inline expansion folds K[i]/BLOCK[i] constants and emits ADDI with the full 32-bit constant in the immediate field of `InlineInstr.imm`. When stepInline encoded these as RISC-V instruction words, the 12-bit ADDI imm field truncated the constant to its lower 12 bits sign-extended, so the trace's `instruction` field could no longer reproduce the actual rd value (rd = rs1 + full_imm). Stage 5's InstructionReadRaf brute force exposed the symptom: for sha256_inline the prover's `materializeTableEntry(table, lookup_idx)` (computed from the truncated 12-bit imm) disagreed with `step.rd_value` (computed from the full imm) on 64 cycles — exactly the K[i] folded constants. fib_sdk and the other non-inline baselines have 0 such mismatches. Schema-level fix: - Add `inline_full_imm: u64 = 0` and `has_full_imm: bool = false` to TraceStep. stepInline sets both for ADDI/XORI/ANDI emitted from sha256_inline. Imm-decoding paths updated to honour the override: - `decodeImmediateInt` (trace_witness.zig integer path) — used by the raw R1CS witness Imm and RightInstructionInput fields. - `computeU128LookupOperandInt` (trace_witness.zig) — feeds Stage 5 RightLookupOperand for identity-path AddOperands. - `processTraceCycleCombined` in stage5_instances.zig — Stage 5 `lookup_indices_u128`, `right_op`, and the `imm_val` used for RightInstructionInput. - `computeLookupIndex` in stage6_helpers.zig — used by the InstructionRa one-hot polynomial commitment (proving_pipeline.zig:811), Stage 6 LookupsRaVirtual prover, Stage 6 Booleanity prover, and Stage 7. - `populateEntryFromJoltInstruction` in bytecode_entries.zig — restores the full u64 imm on the bytecode entry after `populateEntryFromInstruction` decodes only the truncated 12 bits from the synthetic instruction word. Independent off-by-one fix surfaced by the brute force: - `LookupTable.materializeTableEntry` had odd/even bit positions swapped AND had stale table indices (14, 19-28, 30 were shifted relative to `getLookupTableIndex`/`evaluateTableMLE`). Rewrote both bit extraction to match Jolt's `uninterleave_bits` (LEFT = ODD bits, RIGHT = EVEN bits) and the table dispatch to match the production indices. Function is diagnostic-only — bug was dormant. Diagnostics added (env-gated, off in production): - `ZOLT_S5_BRUTE` (stage5_lookups.zig + stage5_prover.zig): brute-force reconstruction of the InstructionReadRaf round-0 polynomial from cycle data with per-table breakdown, plus per-cycle materializeTableEntry vs trace.rd_value comparison. - `ZOLT_S6_DEBUG` (stage6_prover.zig): per-instance Stage 6 input claims and per-cycle bytecode-entry-imm vs trace-step-inline_full_imm sanity check. Dead diagnostic code removed in streaming_outer.zig and stage3_prover.zig (referenced no-longer-existing struct fields, blocking compilation when debug_verbose=true). Status: - sha256_inline: Stage 5 InstructionReadRaf now passes (eval_0/eval_1/claim all match brute force, 0 PERCYCLE mismatches across 4096 cycles). Stage 6 batched sumcheck still fails — separate root cause that only became reachable after the Stage 5 fix. - All baselines verify: fibonacci_sdk, fibonacci, bitwise, alloc_sdk, collatz, sha256_1024. - `zig build test` passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All `examples/sha256*.elf` are now Jolt SDK guest binaries that call `jolt_inlines_sha2::Sha256::digest` and verify end-to-end under both Zolt and Jolt's native prover (`jolt-bench`, rev 997c1543). ## Core correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual `stage6_helpers.zig::computeLookupIndex` had no case for opcode 0x7B (our internal synthetic for `VirtualRev8W`, used by the inline SHA-256 expansion for endian conversion). It fell through the default path, returning `interleaveBits(0, 0) = 0`. Meanwhile `stage5_instances.zig` hand-codes 0x7B as identity-path returning `rs1_value`, matching Jolt's `RISCVCycle<VirtualRev8W>::to_lookup_index`. Result: the committed `InstructionRa` polynomials had lookup_index=0 for every VirtualRev8W cycle while Stage 5 opened them as `rs1_value`, so the Stage 6 LookupsRaVirtual sumcheck was reducing two inconsistent polynomials. sha256_inline hit this at round 1 of the LookupsRaVirtual instance; fib_sdk/sha256_1024 never emitted 0x7B so they were unaffected. Found by a per-round brute-force diagnostic that computed `(1-r_curr)*q_raw(0) + r_curr*q_raw(1)` directly from the ra_polys at the first active round and compared to the incoming claim. For fib_sdk they matched, for sha256_inline they diverged at round 1 ─ before any sumcheck reduction, pointing at the inputs. Diffing the two hand-rolled identity-path opcode sets surfaced the missing 0x7B. ## New inline SHA-256 guest workspace `examples/sha2-inline-guests/` is a six-member Cargo workspace (`guest-64/128/512/1024/2048/inline`) pinned to the same Jolt rev as `jolt-bench`, so cargo reuses the already-cached git checkouts. Each guest is a tiny `#[jolt::provable]` function that hashes a fixed-size zero buffer. The workspace ships its own `linker.ld` (ported from jolt-core's `linker.ld.template`) and `.cargo/config.toml` matching the rustflags `zeroos-build` applies when `jolt build --mode no-std` runs, so we build guests via direct `cargo build --release --bin ...` without depending on whatever `jolt` CLI is installed locally. The built ELFs replace the previously-committed software-SHA binaries. `sha256_1024.elf` went from 9550 ms (software SHA of 1024 bytes, trace_length 524288) to 2611 ms (inline SHA, trace_length 65536) ─ a 3.7× speedup from the inline expansion alone. ## jolt-bench patches `jolt-inlines-sha2` is now a dependency with the `host` feature ─ its `#[ctor::ctor]` registers the SHA-256 sequence builder at startup, without which the tracer panics "No inline sequence builder registered for opcode=0x0b". An `#[allow(unused_imports)] use jolt_inlines_sha2 as _` in main.rs forces the crate to be linked. `MemoryConfig` now uses the jolt-sdk macro's default sizes (heap/stack 64 KiB, io/advice 4 KiB each). The historical 32 MiB/32 MiB/2 MB values only worked for software-SHA guests whose stack/io mismatches were hidden by the lack of MMU bounds checks; any SDK-built guest has compile-time `output_start` hardcoded from its `#[jolt::provable]` attributes and needs the runtime config to match. ## Memory leak fixes Three unrelated leaks uncovered by `zig build`'s gpa leak tracker while running the new regression set: - `stage5_prover.zig` allocated `lookups_ra_weights` but never passed it to `LookupsReadRafProver.init` — dead allocation. - `stage7_prover.zig`'s parallel G-table reduce merged right into left and returned left, leaking right's `[][]F` rows every time. Reworked `LocalG` as a struct carrying its own `Allocator` so the reduce function (which has no context parameter) can free merged partials. - `proving_pipeline.zig` never called `deinit()` on the inner `jolt_prover::JoltProver(F)` that lazy-allocates `_gpu_accel`/`_gpu_poly`/`_gpu_msm` in `enableGpu`. Added a matching `defer converter.deinit()`. ## Other small fixes - `common/jolt_device.zig::remapAddress` now returns `null` for addresses below `lowest_address` instead of panicking, so SDK guests whose traces touch IO/padding addresses don't crash the prover. - `isPowerOfTwo(0)` is asserted by `std.math`, so the advice-size zero check needs to come first in the `or` chain (Zig short- circuits). Reordered. ## Verification Full regression under `zig build test` + `./bench/run-bench.sh`: Program │ Cycles │ Trace Len │ Zolt (ms) │ Jolt (ms) │ Ratio ──────────────┼─────────┼───────────┼────────────┼────────────┼──────── sha256 │ 6647 │ 8192 │ 932.03 │ 1267.33 │ 0.74x sha256_128 │ 9375 │ 16384 │ 1196.43 │ 1594.30 │ 0.75x sha256_512 │ 27412 │ 32768 │ 2075.16 │ 2190.78 │ 0.95x sha256_1024 │ 50852 │ 65536 │ 2611.47 │ 2818.48 │ 0.93x sha256_2048 │ 97741 │ 131072 │ 4336.91 │ 4555.72 │ 0.95x sha256_inline │ 3690 │ 4096 │ 705.16 │ 1099.65 │ 0.64x TOTAL │ 11857.16 │ 13526.26 │ 0.88x All non-SHA baselines (fibonacci, fibonacci_sdk, bitwise, alloc_sdk, collatz) still verify with zero leaks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the Zolt prover to feature-parity with Jolt's
jolt-inlines-sha2: allexamples/sha256*.elfnow use the inline SHA-256 compression opcode and verify end-to-end under both Zolt and Jolt's own prover (jolt-bench, rev997c1543).This PR is the culmination of the
worktree-jolt-inlinebranch (21 commits) — from the initial SDK ELF loader plumbing in3b611ef1through the MLE/prefix fixes in thefix(lookup_table): ...series to the final Stage 5/6 correctness work and the fresh guest workspace.What changes for end users
| Program | Trace Len │ Zolt (ms) │ Jolt (ms) │ Ratio │
|---------------|-----------|-----------|-----------|-------|
| sha256 | 8192 │ 932.03 │ 1267.33 │ 0.74x |
| sha256_128 | 16384 │ 1196.43 │ 1594.30 │ 0.75x |
| sha256_512 | 32768 │ 2075.16 │ 2190.78 │ 0.95x |
| sha256_1024 | 65536 │ 2611.47 │ 2818.48 │ 0.93x |
| sha256_2048 | 131072 │ 4336.91 │ 4555.72 │ 0.95x |
| sha256_inline | 4096 │ 705.16 │ 1099.65 │ 0.64x |
| TOTAL | │ 11857 │ 13526 │ 0.88x |
sha256_1024.elfwent from 9550 ms (old software-SHA guest, trace_length 524288) to 2611 ms (new inline guest, trace_length 65536) — a 3.7× speedup just from switching to the inline expansion.The headline correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual
src/zkvm/spartan/stage6_helpers.zig::computeLookupIndexhad no case for opcode0x7B(our internal synthetic forVirtualRev8W, the byte-swap-per-32-bit-half lookup table used by the inline SHA-256 endian conversion). It fell through the default path and returnedinterleaveBits(0, 0) = 0. Meanwhilestage5_instances.zig::processTraceCycleCombinedhand-codes0x7Bas identity-path returningrs1_value, matching Jolt'sRISCVCycle<VirtualRev8W>::to_lookup_index(jolt-core/src/zkvm/instruction/virtual_rev8w.rs:46).Result: the committed
InstructionRapolynomials (built fromcomputeLookupIndex) hadlookup_index=0for every VirtualRev8W cycle while Stage 5 opened them asrs1_value, so the Stage 6 LookupsRaVirtual sumcheck was proving consistency between two polynomials that disagreed by design.fib_sdkandsha256_1024(software SHA) never emit0x7B, so they stayed green whilesha256_inlinefailed.The bug was found by a per-round brute-force diagnostic that computed
(1 − r_curr) · q_raw(0) + r_curr · q_raw(1)directly from thera_polysat the very first active LookupsRaVirtual round and compared to the incoming claim:fib_sdk: matches at round 1 ✓sha256_inline: diverges at round 1 ✗Since round 1 is before any sumcheck reduction, the divergence had to be in the inputs — not in the sumcheck algorithm. Diffing the two hand-rolled identity-path opcode sets (
processTraceCycleCombinedvscomputeLookupIndex) immediately surfaced the missing0x7Bcase. Fix is 9 lines instage6_helpers.zig.New infrastructure:
examples/sha2-inline-guests/A six-member Cargo workspace pinned to the same Jolt rev (
997c1543) asjolt-bench, so cargo reuses the already-cached git checkouts. Each guest is a tiny#[jolt::provable]function that hashes a fixed-size zero buffer:guest-64,guest-128,guest-512,guest-1024,guest-2048— digest of[0u8; N]guest-inline— digest of&[](reproduces the smallest-possible inline trace,trace_length=4096)The workspace ships its own
linker.ld(ported fromjolt-core/src/linker.ld.templatewithjolt build --mode no-std --backtrace offdefaults baked in) and.cargo/config.tomlmatching the rustflags thatzeroos-build::build_binary_with_rustflagsapplies. Guests build via directcargo build --release --bin ...— no dependency on whateverjoltCLI is installed locally (which can drift from the pinned Jolt rev).See
examples/sha2-inline-guests/README.mdfor the full build instructions and the rationale for avoiding thejoltCLI.jolt-benchpatchesjolt-inlines-sha2dep withhostfeature. Its#[ctor::ctor]registers the SHA-256 sequence builder at startup. Without it,jolt-benchpanics "No inline sequence builder registered for opcode=0x0b" on any inline-SHA ELF. An#[allow(unused_imports)] use jolt_inlines_sha2 as _inmain.rsforces the crate to be linked (otherwise rustc drops it).MemoryConfigrealigned to macro defaults (heap 64 KiB, stack 64 KiB, io/advice 4 KiB each). The previous ad-hoc values (32 MiB/32 MiB/2 MB input/0 advice) only worked for software-SHA guests whose stack/io mismatches were hidden by the lack of MMU bounds checks. Any SDK-built guest has compile-timeoutput_starthardcoded from its#[jolt::provable]attributes viaMemoryLayout::new(), and needs the runtime config to match exactly or the tracer panics with "I/O overflow: Attempted to read from 0x7FFxxxxx".Memory leak fixes (all uncovered while running the SHA regression set)
zig build's gpa leak tracker now reportsleaks=0across every example:stage5_prover.zig—lookups_ra_weightswas allocated but never passed toLookupsReadRafProver.init(the prover has its own debug-only allocation inside). Dead allocation, removed.stage7_prover.zigG-table parallel reduce —reduceFn(a, b)mergedbintoaand returneda, but never freedb's[][]Frows. ReworkedLocalGfrom a bare[][]Finto a struct carrying its ownAllocatorso the reduce function (which has no context parameter) can free merged partials.proving_pipeline.zig— the innerjolt_prover::JoltProver(F)(converter) lazy-allocates_gpu_accel/_gpu_poly/_gpu_msmviaenableGpuduring the prove path but was neverdeinit'd. Addeddefer converter.deinit().Other small fixes
common/jolt_device.zig::remapAddressnow returnsnullfor addresses belowlowest_addressinstead of panicking with "Unexpected address". SDK guests can legitimately touch IO/padding addresses that lie below the advice region, and panicking there took down the whole prover.isPowerOfTwo(n) or n == 0assertion chain is reordered ton == 0 or isPowerOfTwo(n), becausestd.math.isPowerOfTwo(0)assertsint > 0and would panic before theorshort-circuits.Test plan
zig build test— passes, no regressions./bench/run-bench.sh(default 14 programs) — all VERIFIED./bench/run-bench.sh sha256 sha256_128 sha256_512 sha256_1024 sha256_2048 sha256_inline— all VERIFIED, Zolt wins 6/6leaks=0examples/sha256*.elfcontains the jolt-inline custom-0 compression marker instructions (verified viariscv64-elf-objdump)jolt-benchproves every new SHA ELF without panickingjolt-verifieraccepts every proof Zolt generates🤖 Generated with Claude Code