feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt) by MatteoMer · Pull Request #75 · MatteoMer/zolt

MatteoMer · 2026-04-08T20:02:18Z

Summary

Brings the Zolt prover to feature-parity with Jolt's jolt-inlines-sha2: all examples/sha256*.elf now use the inline SHA-256 compression opcode and verify end-to-end under both Zolt and Jolt's own prover (jolt-bench, rev 997c1543).

This PR is the culmination of the worktree-jolt-inline branch (21 commits) — from the initial SDK ELF loader plumbing in 3b611ef1 through the MLE/prefix fixes in the fix(lookup_table): ... series to the final Stage 5/6 correctness work and the fresh guest workspace.

What changes for end users

| Program | Trace Len │ Zolt (ms) │ Jolt (ms) │ Ratio │
|---------------|-----------|-----------|-----------|-------|
| sha256 | 8192 │ 932.03 │ 1267.33 │ 0.74x |
| sha256_128 | 16384 │ 1196.43 │ 1594.30 │ 0.75x |
| sha256_512 | 32768 │ 2075.16 │ 2190.78 │ 0.95x |
| sha256_1024 | 65536 │ 2611.47 │ 2818.48 │ 0.93x |
| sha256_2048 | 131072 │ 4336.91 │ 4555.72 │ 0.95x |
| sha256_inline | 4096 │ 705.16 │ 1099.65 │ 0.64x |
| TOTAL | │ 11857 │ 13526 │ 0.88x |

sha256_1024.elf went from 9550 ms (old software-SHA guest, trace_length 524288) to 2611 ms (new inline guest, trace_length 65536) — a 3.7× speedup just from switching to the inline expansion.
Zolt beats Jolt on every SHA row. Total across the SHA suite: -1669 ms (0.88×).
Every SHA ELF contains the custom-0 inline compression marker (verified via objdump).

The headline correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual

src/zkvm/spartan/stage6_helpers.zig::computeLookupIndex had no case for opcode 0x7B (our internal synthetic for VirtualRev8W, the byte-swap-per-32-bit-half lookup table used by the inline SHA-256 endian conversion). It fell through the default path and returned interleaveBits(0, 0) = 0. Meanwhile stage5_instances.zig::processTraceCycleCombined hand-codes 0x7B as identity-path returning rs1_value, matching Jolt's RISCVCycle<VirtualRev8W>::to_lookup_index (jolt-core/src/zkvm/instruction/virtual_rev8w.rs:46).

Result: the committed InstructionRa polynomials (built from computeLookupIndex) had lookup_index=0 for every VirtualRev8W cycle while Stage 5 opened them as rs1_value, so the Stage 6 LookupsRaVirtual sumcheck was proving consistency between two polynomials that disagreed by design. fib_sdk and sha256_1024 (software SHA) never emit 0x7B, so they stayed green while sha256_inline failed.

The bug was found by a per-round brute-force diagnostic that computed (1 − r_curr) · q_raw(0) + r_curr · q_raw(1) directly from the ra_polys at the very first active LookupsRaVirtual round and compared to the incoming claim:

For fib_sdk: matches at round 1 ✓
For sha256_inline: diverges at round 1 ✗

Since round 1 is before any sumcheck reduction, the divergence had to be in the inputs — not in the sumcheck algorithm. Diffing the two hand-rolled identity-path opcode sets (processTraceCycleCombined vs computeLookupIndex) immediately surfaced the missing 0x7B case. Fix is 9 lines in stage6_helpers.zig.

New infrastructure: `examples/sha2-inline-guests/`

A six-member Cargo workspace pinned to the same Jolt rev (997c1543) as jolt-bench, so cargo reuses the already-cached git checkouts. Each guest is a tiny #[jolt::provable] function that hashes a fixed-size zero buffer:

guest-64, guest-128, guest-512, guest-1024, guest-2048 — digest of [0u8; N]
guest-inline — digest of &[] (reproduces the smallest-possible inline trace, trace_length=4096)

The workspace ships its own linker.ld (ported from jolt-core/src/linker.ld.template with jolt build --mode no-std --backtrace off defaults baked in) and .cargo/config.toml matching the rustflags that zeroos-build::build_binary_with_rustflags applies. Guests build via direct cargo build --release --bin ... — no dependency on whatever jolt CLI is installed locally (which can drift from the pinned Jolt rev).

See examples/sha2-inline-guests/README.md for the full build instructions and the rationale for avoiding the jolt CLI.

`jolt-bench` patches

jolt-inlines-sha2 dep with host feature. Its #[ctor::ctor] registers the SHA-256 sequence builder at startup. Without it, jolt-bench panics "No inline sequence builder registered for opcode=0x0b" on any inline-SHA ELF. An #[allow(unused_imports)] use jolt_inlines_sha2 as _ in main.rs forces the crate to be linked (otherwise rustc drops it).
MemoryConfig realigned to macro defaults (heap 64 KiB, stack 64 KiB, io/advice 4 KiB each). The previous ad-hoc values (32 MiB/32 MiB/2 MB input/0 advice) only worked for software-SHA guests whose stack/io mismatches were hidden by the lack of MMU bounds checks. Any SDK-built guest has compile-time output_start hardcoded from its #[jolt::provable] attributes via MemoryLayout::new(), and needs the runtime config to match exactly or the tracer panics with "I/O overflow: Attempted to read from 0x7FFxxxxx".

Memory leak fixes (all uncovered while running the SHA regression set)

zig build's gpa leak tracker now reports leaks=0 across every example:

stage5_prover.zig — lookups_ra_weights was allocated but never passed to LookupsReadRafProver.init (the prover has its own debug-only allocation inside). Dead allocation, removed.
stage7_prover.zig G-table parallel reduce — reduceFn(a, b) merged b into a and returned a, but never freed b's [][]F rows. Reworked LocalG from a bare [][]F into a struct carrying its own Allocator so the reduce function (which has no context parameter) can free merged partials.
proving_pipeline.zig — the inner jolt_prover::JoltProver(F) (converter) lazy-allocates _gpu_accel/_gpu_poly/_gpu_msm via enableGpu during the prove path but was never deinit'd. Added defer converter.deinit().

Other small fixes

common/jolt_device.zig::remapAddress now returns null for addresses below lowest_address instead of panicking with "Unexpected address". SDK guests can legitimately touch IO/padding addresses that lie below the advice region, and panicking there took down the whole prover.
The isPowerOfTwo(n) or n == 0 assertion chain is reordered to n == 0 or isPowerOfTwo(n), because std.math.isPowerOfTwo(0) asserts int > 0 and would panic before the or short-circuits.

Test plan

zig build test — passes, no regressions
./bench/run-bench.sh (default 14 programs) — all VERIFIED
./bench/run-bench.sh sha256 sha256_128 sha256_512 sha256_1024 sha256_2048 sha256_inline — all VERIFIED, Zolt wins 6/6
All non-SHA baselines (fibonacci, fibonacci_sdk, bitwise, alloc_sdk, collatz) still verify with leaks=0
Every examples/sha256*.elf contains the jolt-inline custom-0 compression marker instructions (verified via riscv64-elf-objdump)
jolt-bench proves every new SHA ELF without panicking
jolt-verifier accepts every proof Zolt generates

🤖 Generated with Claude Code

… fix This change adds infrastructure for Jolt SDK ELFs and the jolt-inlines/sha2 SHA256 precompile, plus the bug fix that makes SDK fibonacci verify against Jolt's verifier. Why - SDK ELFs (compiled with jolt-sdk) introduced new instruction patterns (CSRRW/CSRRS/MRET in _start, custom 0x5B advice/host-IO opcodes, JAL/JALR with rd=x0 remapped to virtual register 40, FENCE) that the prior Zolt prover did not handle. - Jolt-inline replaces crypto operations in guest programs with custom RISC-V opcodes (0x0B); the bytecode preprocessing, lookup tables, and trace builder all needed to recognize and expand these. - The single most important bug for SDK ELF verification was a mismatch between the bytecode val_polys (Stage 4 RegistersRWC) and the trace's Rs1Ra polynomial: the trace step's `rs1_read` field was false for custom 0x5B opcodes (VirtualHostIO/Advice*), but the bytecode entry carried rs1=Some(0). The val_poly therefore added eq_table_4[0] for these entries while the Stage 4 sumcheck did not, producing diverging per-stage claims and Stage 6 sumcheck failure. Fix - src/tracer/mod.zig: in stepNormal, set `reads_rs1=true` for custom-2 opcode 0x5B with funct3 != 0/5 (the SDK Virtual{HostIO,Advice*} family). funct3=0/5 are VirtualSRL/SRA which have their own step handlers that set rs1_read/rs2_read explicitly, so we leave them alone. - src/zkvm/proving_pipeline.zig: pass device.memory_layout.termination through to the prover-internal BytecodePreprocessing so the synthetic termination LUI/ADDI/SD entries match the exporter's. Defensive — does not by itself cause Stage 6 failure on the test workloads but keeps the two preps byte-for-byte identical. Other infrastructure (used by jolt-inline and SDK) - Decoder: handle opcode 0x0B (custom-0 jolt-inline), 0x5B (SDK custom-2 VirtualHostIO/AdviceLoad/AdviceLen), 0x6B (custom-3 VirtualROTRI/W), CSRRW/CSRRS/MRET decoding, and JAL/JALR rd=x0 remapping to vr40. - bytecode_preprocessing.zig: decompose CSRRW/CSRRS/MRET into ADDI/OR/JALR sequences matching Jolt's tracer's inline_sequence exactly (vr34..vr39 CSR registers, vr40 temp), and synthetic termination JAL with rd=vr40. - Lookup tables: implement VirtualROTR (idx 27) and VirtualROTRW (idx 28) MLE evaluations and materialize entries. - Lookups + lookup_trace: add Andn / VirtualROTRI / VirtualROTRIW types and recorders. - spartan/bytecode_entries.zig: add ANDN / VirtualROTRI / VirtualROTRIW entry population, FENCE/ECALL/CSR/MRET handling in populateEntryFromJoltInstruction, JAL rd=0→vr40 remapping in entry population, IsLastInSequence post-processing for JALR vsr=0. - spartan/stage5_*: add 0x6B and 0x5B handling in opcode dispatch. - r1cs/constraints.zig + trace_witness.zig: 0x5B funct3 guards on the compact integer R1CS path, 0x6B/0x0B/0x2B handling. - tracer/sha256_inline.zig: NEW — SHA256 sequence builder ported from jolt-inlines/sha2 (~500 lines), expands a 0x0B SHA256 instruction into the ~550 virtual instructions Jolt produces. - jolt-verifier: pin to a Jolt revision that matches the proof format. Status - Plain C ELFs (fibonacci, sha256, sha256_128, sha256_2048, collatz, primes_large) verify against Jolt's verifier — unchanged. - examples/fibonacci_sdk.elf VERIFIES against Jolt's verifier — was previously failing at Stage 6 with SumcheckVerificationError. - examples/sha256_inline.elf still panics in the prover at evaluators.zig:1013 (constraint violation in Az/Bz product). This is a pre-existing issue inside the inline SHA256 sequence; tracked as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lRev8W The R1CS witness's `next_is_virtual` and `is_virtual_opcode` checks treated opcode 0x5B with funct3=0 or funct3=5 as always-virtual. In upstream Jolt's custom-2 opcode space, those funct3 values map to first-class ELF instructions (VirtualRev8W and AdviceLW respectively), not virtual sequences. The bogus "virtual" classification fired R1CS constraint 18 (MustStartSequenceFromBeginning) on every transition from a real ELF instruction into one of these custom-2 instructions, and the prover hit the constraint-satisfaction assertion in interpolateAzBzProductInt while building the witness for sha256_inline.elf. Fix: - src/zkvm/r1cs/trace_witness.zig: drop the 0x5B-funct3=0/5 branches in both `is_virtual_opcode` and `next_is_virtual`. The vsr-based check (`virtual_sequence_remaining > 0` or `is_last_in_sequence`) already catches internal expansions like our SHA256-inline VirtualSRLI sub-steps. Also add the start of VirtualRev8W support (the byte-swap-each-32-bit-half helper used by jolt-inlines/sha2's swap_bytes()): - jolt_instruction.zig + decoder: VirtualRev8W variant decoded from 0x5B funct3=0; AdviceLB/H/W/D variants for funct3=3..6; VirtualAdviceLoad as the inline-sequence target. - spartan/bytecode_entries.zig: populateEntryFromJoltInstruction handler for VirtualRev8W (lookup table 24, AddOperands+WriteLookupOutputToRD, LeftOperandIsRs1Value); SDK NoOp-like handler covers AdviceLB/H/W/D. - instruction/mod.zig: LookupTables.VirtualRev8W variant + materializeEntry. - instruction/lookups.zig: VirtualRev8WLookup type. - instruction/lookup_trace.zig: recordVirtualRev8W. - lookup_table/mod.zig: VirtualRev8W table struct + evaluateMLE; wired into evaluateTableMLE at index 24 (matches Jolt 997c1543's enum ordering). - tracer/mod.zig: stepVirtualRev8W single-cycle handler that computes rd = byte_swap_per_half(rs1) and emits the trace step. Status: - All 6 previously-passing ELFs (fibonacci, sha256, collatz, primes_large, sha256_128, fibonacci_sdk) still verify. - examples/sha256_inline.elf no longer panics in the prover; the prover now generates a complete proof. Verification still fails at Stage 2 because AdviceLW/AdviceLD inline-sequence expansion (VirtualAdviceLoad + SLLI + SRAI) is not yet implemented in the bytecode preprocessor or emulator. Tracked as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AdviceL{B,H,W,D} instructions (Jolt SDK custom-2 opcode 0x5B funct3=3..6) are no longer treated as NoOps in the prover. They now expand to the same virtual sub-sequence Jolt's tracer produces via inline_sequence(): AdviceLB/LH/LW (64-bit mode): VirtualAdvice → VirtualMULI (SLLI) → VirtualSRAI (SRAI) AdviceLD: VirtualAdvice (single cycle, already 64-bit) - src/tracer/mod.zig: - Emulator gains an `advice_pos` cursor and `adviceTapeRead(num_bytes)` helper that consumes bytes from `device.untrusted_advice`, returning 0 once the tape is exhausted (valid witness — advice is externally supplied and only range- checked by Jolt's proof system). - New `stepAdviceLoadSignExt` emits the 3-cycle (or 1-cycle for LD) sequence with correct vsr=2,1,0 and is_first=true,false,false flags, matching Jolt's finalize() layout. Sign-extension is performed by the shift pair for the sub-word loads. - `step()` dispatches 0x5B funct3=3..6 to `stepAdviceLoadSignExt`. - src/zkvm/bytecode_preprocessing.zig: decode-time expansion of .AdviceLB/LH/LW into three JoltInstruction entries (VirtualAdvice, VirtualMULI, VirtualSRAI) and .AdviceLD into one (VirtualAdvice), so the exported bytecode matches the trace's virtual sub-sequence exactly. - src/zkvm/proving_pipeline.zig: update `computeBytecodeCodeSizeWithTextSize` to count 3 entries for AdviceLB/LH/LW and 1 for the other custom-2 variants, so `bytecode_K` stays in sync with the expanded bytecode array. Status: - All 6 previously-passing ELFs still verify (fibonacci, sha256, collatz, primes_large, sha256_128, fibonacci_sdk). - examples/sha256_inline.elf still fails verification at Stage 2, but the prover no longer panics. The remaining Stage 2 failure is in the combined InstructionLookupsClaimReduction / SpartanProductVirtualization / Ram* batch and needs further investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Our internal VirtualSRLI (emitted by SRLI decomposition) uses opcode 0x5B funct3=0 as its synthetic trace-step encoding, which collides with Jolt's VirtualRev8W instruction that uses the same encoding at the ELF level. Make stepVirtualRev8W emit a synthetic trace instruction word with opcode 0x7B funct3=0 instead, so Stage 5's getLookupTableIndex and the is_identity_path classifier can distinguish it from VirtualSRLI without any bytecode-walking or opcode-overlapping tricks. The serialized bytecode JSON still uses the variant tag .VirtualRev8W so Jolt's verifier interprets it correctly. - src/tracer/mod.zig: stepVirtualRev8W builds a synthetic 0x7B funct3=0 instruction word with rd/rs1 set, rs2/imm=0. - src/zkvm/spartan/bytecode_entries.zig: - populateEntryFromJoltInstruction .VirtualRev8W sets entry.opcode=0x7B - getLookupTableIndex(0x7B, ...) → 24 (VirtualRev8WTable) - isKnownInstruction recognizes 0x7B - src/zkvm/spartan/stage5_instances.zig: getLookupTableIndex(0x7B) → 24, is_identity_path(0x7B) → true, identity-path lookup index = rs1. - src/zkvm/r1cs/trace_witness.zig: extractOperandFlags(0x7B) sets LeftOperandIsRs1Value + AddOperands + WriteLookupOutputToRD. decodeImmediateInt(0x7B) returns 0. computeU128LookupOperandInt(0x7B) returns rs1_value. readsRs1 includes 0x7B. All 6 previously-verifying ELFs still pass. sha256_inline still fails at Stage 2 — that failure is pre-existing and unrelated to this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dFlags The compact R1CS witness path's extractOperandFlags had no case for opcode 0x6B (custom-3, VirtualROTRI/VirtualROTRIW), leaving the per-cycle flags all false. This silently diverged from the bytecode entry's circuit/instruction flags for any cycle that executed a VirtualROTRI/W virtual instruction — this primarily matters for jolt-inline SHA256 which emits hundreds of VirtualROTRI sub-steps via its inline expansion. - src/zkvm/r1cs/trace_witness.zig: - extractOperandFlags handles 0x6B with LeftOperandIsRs1Value + RightOperandIsImm + WriteLookupOutputToRD (interleaved path, matching VirtualROTRILookup.instructionFlags()). - decodeImmediateInt reconstructs the bitmask from the rotation amount stored in the I-type imm field, so Stage 1 val_poly sees the same bitmask value the bytecode entry carries (populateVirtualROTRI stores the full 64-bit bitmask in entry.imm). All 6 previously-verifying ELFs still pass. sha256_inline still fails at Stage 2; the remaining failure is not in the 0x6B path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Removes the ZOLT_DUMP_TRACE diagnostic added during sha256_inline investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Our stepInline handler for the jolt-inline SHA256 expansion set rd_written = (!is_store and rd != 0), excluding rd=0 from RdWa tracking. But Jolt's convention (documented in stepNormal) is that *every* instruction with an rd field writes to rd, even when rd=0 — Jolt captures cpu.x[0] before/after execution and treats the (0→0) transition as a write. Excluding rd=0 causes the Stage 4 Rs1Ra/RdWa sumcheck claims for the SHA256 sub-cycles to diverge from what Jolt's verifier reconstructs from the serialized bytecode entries. - src/tracer/mod.zig: stepInline trace step now sets rd_written=!is_store, matching stepNormal. All 6 previously-verifying ELFs still pass. sha256_inline still fails at Stage 2 (the 5-way batched sumcheck); the remaining failure is not in the rd_written path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three independent fixes that bring sha256_inline.elf much closer to verifying against Jolt's verifier (Stage 6 BCRAF per-stage claims now all match between prover and bytecode-derived val polys): 1. constraints.zig: hasLookupTable now returns true for opcode 0x7B (synthetic VirtualRev8W). Without this the trace witness silently skipped flag computation for VirtualRev8W cycles, leaving cf[0] AddOperands and cf[6] WriteLookupOutputToRD as zero while the bytecode entry had them set. 2. stage5_instances.zig: add 0x7B to left_is_rs1 / lookup-operand computation, mirroring VirtualZeroExtendWord (0x42), and split getLookupTableIndex's funct3==7 case so ANDN (funct7=0x20) routes to table 3 (AndnTable) instead of collapsing into table 2 (AND). The two were aliased before, causing LookupTableFlag[2,3] sums to diverge between the trace polynomial and the bytecode val polys. 3. bytecode_preprocessing.zig: in the SHA256 inline expansion path, convert the shamt stored in InlineInstr.imm into the multiplier (1 << shamt) before populating the VirtualMULI bytecode entry, matching Jolt's VirtualMULI::operands.imm convention and the standalone SLLI->VirtualMULI handler. Stage 2 verification still fails for sha256_inline (RWC and OutputSumcheck instances disagree), but ProductVirt/InstrLookups/RAF plus all five Stage 6 BCRAF stages now match.

The SHA256 inline expansion in stepInline was calling self.ram.read / self.ram.write directly, which bypassed the I/O-region check. When the host program passes an output-region pointer (e.g. 0x7fffb000) as the state pointer to the SHA256 inline opcode, the resulting hash bytes were written to plain RAM instead of being captured into device.outputs. That left val_final[output_index] != val_io[output_index] when the verifier reconstructs the IO polynomial via eval_io_mle, which is what breaks Stage 2's RamReadWriteChecking and OutputSumcheck for sha256_inline. Switch the inline SD path to writeWordWithIO and the inline LD path to readWordWithIO so that I/O-region accesses follow the same code path as regular SDs/LDs.

The SB/SH/SW decomposition in stepSubWordStore was bypassing the IO check by calling self.ram.write directly for the final SD step. That left device.outputs empty for any program that writes its result via sub-word stores (which is what the Rust SDK does for typical small return values), so: 1. The trace's val_final and the verifier's val_io disagreed at the output region for any program with non-zero outputs (which is why sha256_inline failed Stage 2's OutputSumcheck and RamRafEvaluation). 2. The Fiat-Shamir preamble couldn't see the actual program output. Fixes: - writeWordWithIO now routes writes to BOTH device.outputs (for the preamble + val_io) AND ram.write (so val_final and subsequent read-modify-write LD steps observe the same byte values). - stepSubWordStore's SB/SH and SW SD-finalization steps now go through writeWordWithIO instead of self.ram.write directly. - proveJoltCompatibleWithDoryAndSrsAtAddress trims trailing zero bytes off the outputs slice before passing it to the JoltDevice. Jolt's RV64IMACVerifier::new applies the same truncation (program_io.outputs.truncate(...)) before running its preamble; if the prover doesn't trim too, the prover/verifier transcripts diverge for any program whose output ends in zero bytes (e.g. fib_sdk computing fib(0)=0, which used to "verify" only because the pre-fix outputs were always empty). - JoltProofWithDory now carries program_inputs / program_outputs / program_panic so the prove command can write a `<proof>.io` ark-serialize sidecar containing the JoltDevice next to the proof. - jolt-verifier reads `<proof>.io` (when present) and uses the deserialized JoltDevice as program_io for verification, replacing the previous "always empty default" behaviour. Falls back to the old default when no sidecar exists, preserving compatibility. After this commit: - All previously verifying ELFs (fibonacci, fibonacci_sdk, alloc_sdk, bitwise, collatz, factorial, gcd, primes, signed, sum, sha256_128, sha256_1024, primes_large) still verify. - sha256_inline.elf now passes Stage 2's RAF, Output, Product, and InstructionLookups instances and the Stage 6 BCRAF claims; the only remaining mismatch is Stage 2 RamReadWriteChecking (inst[0]). That is the next thing to chase.

In the inline instruction sequence handler for SD, memory_pre_value was read from ram.memory AFTER writeWordWithIO had already stored the new doubleword, so the TraceStep recorded the POST value instead of the pre-value. This corrupted the Stage 6 RamInc polynomial (ram_inc = memory_value - memory_pre_value = post - post = 0) and caused Stage 2 RamReadWriteChecking verification to fail for sha256_inline. Capture the pre-value before the write. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…EN inputs The verifier-style expected_output_claim path in InstructionReadRaf calls LookupTable.evaluateTableMLE for every table flag opening. UpperWord (table 13) was a panic, which made it impossible to drive the prover-side expected-claim diagnostic for any program that uses MULHU. Pow2 (table 21) is a single-operand identity-path table whose existing struct method asserts r.len == XLEN, but Jolt's evaluate_mle takes the full 2 * XLEN interleaved opening point. Add a length-aware dispatch that runs Jolt's product formula when r.len == 2*XLEN and falls back to the legacy single-operand variant otherwise. Also raise @setEvalBranchQuota in the few MLEs whose `inline for` over XLEN exceeds the comptime branch limit when called from runtime-dispatched paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

VirtualSRA.evaluateMLE was using r[2*(XLEN-1)] as the sign bit and folding the sign-extension contribution inside the iteration loop. Jolt's evaluate_mle (jolt-core/src/zkvm/lookup_table/virtual_sra.rs) uses r[0] as the sign bit and accumulates a separate sign_extension term weighted by 2^i (skipping i == 0). Rewrite to match Jolt exactly. ShiftRightBitmask.evaluateMLE assumed a single-operand `r.len == XLEN` input and brute-forced the sum, while Jolt's evaluate_mle expects `r.len == 2 * XLEN` and reads the LAST log2(XLEN) elements as the encoded shift amount. Add a 2*XLEN path that mirrors Jolt and keep the legacy path for the 8-bit test harness. These functions are currently only reachable via the verifier-style expected-claim diagnostic path (evaluateTableMLE), but were silently producing wrong values any time the diagnostic was run on a program whose Stage 5 cycles exercised these tables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The local `rev8w` helper in src/zkvm/lookup_table/prefixes.zig only processed the lower 32 bits of its u64 input — it masked off the upper half before byte-swapping. Jolt's tracer::instruction::virtual_rev8w::rev8w reverses bytes within EACH 32-bit half independently and concatenates them back into a u64. This helper is consumed by the Rev8W prefix MLE and its checkpoint update, so any prefix evaluation involving bits in the upper half of the lookup index produced a value different from Jolt's, causing Stage 5 InstructionReadRaf to disagree with the verifier for any program that exercises VirtualRev8WTable (table 24). sha256_inline triggers this because the inline SHA256 expansion emits VirtualRev8W cycles for endian conversion. Match Jolt's full-u64 implementation: (v as u32).swap_bytes() as u64 + (((v >> 32) as u32).swap_bytes() as u64) << 32 Verified via the existing REMAT_VS_MLE diagnostic: with this fix, stored_table_values[24] computed via prefix/suffix decomposition now agrees with the direct MLE evaluation at r_address for both fib_sdk (unused → was canceling) and sha256_inline (used → was bug). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…MLEs Jolt's `rightShiftWPrefixMle` and `leftShiftWHelperPrefixMle` use Rust's `F::from_u32(1 << y.leading_ones())` and `F::from_u32(u32::from(x) >> y.trailing_zeros())`. In release mode, `u32` shift counts are masked mod 32, so `1u32 << 32` wraps to `1u32 << 0 = 1` (NOT 0) and `x >> 32` wraps to `x >> 0 = x`. Zolt previously had conservative `if (shift >= 64) return result` / `else 0` branches which zero out the result at those boundaries — diverging from Jolt's wrapping behavior when `y.leading_ones() == 32` (y is all-ones in its bit width) or `y.trailing_zeros() == 32` (y == 0). This alignment doesn't fix sha256_inline's Stage 5 failure on its own, but it removes a known-divergence from the Jolt reference that could quietly corrupt prefix MLE evaluations for any trace that happens to hit these edge cases. All existing regressions (fibonacci_sdk, fibonacci, bitwise, alloc_sdk, collatz, sha256_1024) continue to verify. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gated behind ZOLT_S5_DEBUG=1 env var, dumps at the end of Stage 5 lookups: - current_claim (tracked through sumcheck round Horner updates) - combined_vals[0], lookups_current_scalar, E_in[0], E_out[0], ra_product - self_check = scalar * E_in[0] * E_out[0] * combined_vals[0] * Π ra_chunks If the polynomial state is internally consistent with the round polynomial chain, self_check must equal current_claim. For sha256_inline this invariant currently fails (self_check_eq_current=false), pinpointing a per-round computation bug somewhere in proverMsgReadChecking / proverMsgRaf — the next step for the Stage 5 InstructionReadRaf investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ord, VirtualROTRW Adds Zig ports of Jolt's `prefix_suffix_test::<XLEN, F, T>()` (jolt-core test.rs:49) for the three tables that sha256_inline uses and that fib_sdk / sha256_1024 do not — these were the prime suspects for the Stage 5 InstructionReadRaf drift. The test walks 50 random lookup indices through all 8 × 16 = 128 address rounds, binding prefix bits incrementally, and at each round compares combine(prefix_evals, suffix_evals) against the table MLE evaluated at the reconstructed 128-element field point. A mismatch would pinpoint a bug in either the prefix MLE, the suffix MLE, the combine formula, or the prefix checkpoint updates. Results: - Andn (table 3): PASS - LowerHalfWord (table 19): PASS - VirtualROTRW (table 28): PASS (with bitmask-format index generator matching Jolt's `gen_bitmask_lookup_index`; VirtualROTRW's MLE is only well-defined when the right operand y is a contiguous-leading-ones bitmask, so the generator restricts to that subset.) This narrows the sha256_inline Stage 5 drift to components OTHER than these three tables' address-round prefix-suffix decomposition. Remaining suspects: RAF (left/right/identity prefix MLEs), rematerialization at round 128, cycle-round polynomial computation, or a prefix checkpoint update for a table that is mid-chain. Also fixes a stale test in src/tracer/sha256_inline.zig that referenced InstrKind.SRLI (removed in favor of VirtualSRLI) — this was blocking `zig build test` entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndHalfWord Adds the SignExtendHalfWord (table 20) prefix-suffix decomposition test alongside the Andn / LowerHalfWord / VirtualROTRW tests. sha256_inline hits this table via its VirtualSignExtendWord (0x0B) cycles inside the SRLIW decomposition. Test passes — ruling out this table as the source of the Stage 5 drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs the full address-rounds sumcheck at small scale (total_len=8, chunk_len=2, 4 phases × 2 rounds) for a single random cycle and verifies that: left_ps.bound_value == OperandPolynomial::Left::evaluate(r_challenges) right_ps.bound_value == OperandPolynomial::Right::evaluate(r_challenges) identity_ps.bound_value == IdentityPolynomial::evaluate(r_challenges) Covers 30 random trials. Passes, confirming that RafDecomposition's prefix_mle initialization + bind correctly tracks the three RAF polynomials through a complete phased sumcheck. This is weaker than the per-round test (it only checks the final bound value, not intermediate round polynomials), but it's a strong sanity check that the prefix cached-table approach is correct. Sha256_inline's Stage 5 drift must live in the round-polynomial computation (proverMsgRaf), the multi-cycle Q-poly aggregation, cycle rounds, or rematerialization — not in the single-cycle RAF bound-value chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r sha256_inline The SHA-256 inline expansion folds K[i]/BLOCK[i] constants and emits ADDI with the full 32-bit constant in the immediate field of `InlineInstr.imm`. When stepInline encoded these as RISC-V instruction words, the 12-bit ADDI imm field truncated the constant to its lower 12 bits sign-extended, so the trace's `instruction` field could no longer reproduce the actual rd value (rd = rs1 + full_imm). Stage 5's InstructionReadRaf brute force exposed the symptom: for sha256_inline the prover's `materializeTableEntry(table, lookup_idx)` (computed from the truncated 12-bit imm) disagreed with `step.rd_value` (computed from the full imm) on 64 cycles — exactly the K[i] folded constants. fib_sdk and the other non-inline baselines have 0 such mismatches. Schema-level fix: - Add `inline_full_imm: u64 = 0` and `has_full_imm: bool = false` to TraceStep. stepInline sets both for ADDI/XORI/ANDI emitted from sha256_inline. Imm-decoding paths updated to honour the override: - `decodeImmediateInt` (trace_witness.zig integer path) — used by the raw R1CS witness Imm and RightInstructionInput fields. - `computeU128LookupOperandInt` (trace_witness.zig) — feeds Stage 5 RightLookupOperand for identity-path AddOperands. - `processTraceCycleCombined` in stage5_instances.zig — Stage 5 `lookup_indices_u128`, `right_op`, and the `imm_val` used for RightInstructionInput. - `computeLookupIndex` in stage6_helpers.zig — used by the InstructionRa one-hot polynomial commitment (proving_pipeline.zig:811), Stage 6 LookupsRaVirtual prover, Stage 6 Booleanity prover, and Stage 7. - `populateEntryFromJoltInstruction` in bytecode_entries.zig — restores the full u64 imm on the bytecode entry after `populateEntryFromInstruction` decodes only the truncated 12 bits from the synthetic instruction word. Independent off-by-one fix surfaced by the brute force: - `LookupTable.materializeTableEntry` had odd/even bit positions swapped AND had stale table indices (14, 19-28, 30 were shifted relative to `getLookupTableIndex`/`evaluateTableMLE`). Rewrote both bit extraction to match Jolt's `uninterleave_bits` (LEFT = ODD bits, RIGHT = EVEN bits) and the table dispatch to match the production indices. Function is diagnostic-only — bug was dormant. Diagnostics added (env-gated, off in production): - `ZOLT_S5_BRUTE` (stage5_lookups.zig + stage5_prover.zig): brute-force reconstruction of the InstructionReadRaf round-0 polynomial from cycle data with per-table breakdown, plus per-cycle materializeTableEntry vs trace.rd_value comparison. - `ZOLT_S6_DEBUG` (stage6_prover.zig): per-instance Stage 6 input claims and per-cycle bytecode-entry-imm vs trace-step-inline_full_imm sanity check. Dead diagnostic code removed in streaming_outer.zig and stage3_prover.zig (referenced no-longer-existing struct fields, blocking compilation when debug_verbose=true). Status: - sha256_inline: Stage 5 InstructionReadRaf now passes (eval_0/eval_1/claim all match brute force, 0 PERCYCLE mismatches across 4096 cycles). Stage 6 batched sumcheck still fails — separate root cause that only became reachable after the Stage 5 fix. - All baselines verify: fibonacci_sdk, fibonacci, bitwise, alloc_sdk, collatz, sha256_1024. - `zig build test` passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All `examples/sha256*.elf` are now Jolt SDK guest binaries that call `jolt_inlines_sha2::Sha256::digest` and verify end-to-end under both Zolt and Jolt's native prover (`jolt-bench`, rev 997c1543). ## Core correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual `stage6_helpers.zig::computeLookupIndex` had no case for opcode 0x7B (our internal synthetic for `VirtualRev8W`, used by the inline SHA-256 expansion for endian conversion). It fell through the default path, returning `interleaveBits(0, 0) = 0`. Meanwhile `stage5_instances.zig` hand-codes 0x7B as identity-path returning `rs1_value`, matching Jolt's `RISCVCycle<VirtualRev8W>::to_lookup_index`. Result: the committed `InstructionRa` polynomials had lookup_index=0 for every VirtualRev8W cycle while Stage 5 opened them as `rs1_value`, so the Stage 6 LookupsRaVirtual sumcheck was reducing two inconsistent polynomials. sha256_inline hit this at round 1 of the LookupsRaVirtual instance; fib_sdk/sha256_1024 never emitted 0x7B so they were unaffected. Found by a per-round brute-force diagnostic that computed `(1-r_curr)*q_raw(0) + r_curr*q_raw(1)` directly from the ra_polys at the first active round and compared to the incoming claim. For fib_sdk they matched, for sha256_inline they diverged at round 1 ─ before any sumcheck reduction, pointing at the inputs. Diffing the two hand-rolled identity-path opcode sets surfaced the missing 0x7B. ## New inline SHA-256 guest workspace `examples/sha2-inline-guests/` is a six-member Cargo workspace (`guest-64/128/512/1024/2048/inline`) pinned to the same Jolt rev as `jolt-bench`, so cargo reuses the already-cached git checkouts. Each guest is a tiny `#[jolt::provable]` function that hashes a fixed-size zero buffer. The workspace ships its own `linker.ld` (ported from jolt-core's `linker.ld.template`) and `.cargo/config.toml` matching the rustflags `zeroos-build` applies when `jolt build --mode no-std` runs, so we build guests via direct `cargo build --release --bin ...` without depending on whatever `jolt` CLI is installed locally. The built ELFs replace the previously-committed software-SHA binaries. `sha256_1024.elf` went from 9550 ms (software SHA of 1024 bytes, trace_length 524288) to 2611 ms (inline SHA, trace_length 65536) ─ a 3.7× speedup from the inline expansion alone. ## jolt-bench patches `jolt-inlines-sha2` is now a dependency with the `host` feature ─ its `#[ctor::ctor]` registers the SHA-256 sequence builder at startup, without which the tracer panics "No inline sequence builder registered for opcode=0x0b". An `#[allow(unused_imports)] use jolt_inlines_sha2 as _` in main.rs forces the crate to be linked. `MemoryConfig` now uses the jolt-sdk macro's default sizes (heap/stack 64 KiB, io/advice 4 KiB each). The historical 32 MiB/32 MiB/2 MB values only worked for software-SHA guests whose stack/io mismatches were hidden by the lack of MMU bounds checks; any SDK-built guest has compile-time `output_start` hardcoded from its `#[jolt::provable]` attributes and needs the runtime config to match. ## Memory leak fixes Three unrelated leaks uncovered by `zig build`'s gpa leak tracker while running the new regression set: - `stage5_prover.zig` allocated `lookups_ra_weights` but never passed it to `LookupsReadRafProver.init` — dead allocation. - `stage7_prover.zig`'s parallel G-table reduce merged right into left and returned left, leaking right's `[][]F` rows every time. Reworked `LocalG` as a struct carrying its own `Allocator` so the reduce function (which has no context parameter) can free merged partials. - `proving_pipeline.zig` never called `deinit()` on the inner `jolt_prover::JoltProver(F)` that lazy-allocates `_gpu_accel`/`_gpu_poly`/`_gpu_msm` in `enableGpu`. Added a matching `defer converter.deinit()`. ## Other small fixes - `common/jolt_device.zig::remapAddress` now returns `null` for addresses below `lowest_address` instead of panicking, so SDK guests whose traces touch IO/padding addresses don't crash the prover. - `isPowerOfTwo(0)` is asserted by `std.math`, so the advice-size zero check needs to come first in the `or` chain (Zig short- circuits). Reordered. ## Verification Full regression under `zig build test` + `./bench/run-bench.sh`: Program │ Cycles │ Trace Len │ Zolt (ms) │ Jolt (ms) │ Ratio ──────────────┼─────────┼───────────┼────────────┼────────────┼──────── sha256 │ 6647 │ 8192 │ 932.03 │ 1267.33 │ 0.74x sha256_128 │ 9375 │ 16384 │ 1196.43 │ 1594.30 │ 0.75x sha256_512 │ 27412 │ 32768 │ 2075.16 │ 2190.78 │ 0.95x sha256_1024 │ 50852 │ 65536 │ 2611.47 │ 2818.48 │ 0.93x sha256_2048 │ 97741 │ 131072 │ 4336.91 │ 4555.72 │ 0.95x sha256_inline │ 3690 │ 4096 │ 705.16 │ 1099.65 │ 0.64x TOTAL │ 11857.16 │ 13526.26 │ 0.88x All non-SHA baselines (fibonacci, fibonacci_sdk, bitwise, alloc_sdk, collatz) still verify with zero leaks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatteoMer and others added 21 commits April 7, 2026 00:55

chore: remove temporary trace-summary debug helper

662c6aa

Removes the ZOLT_DUMP_TRACE diagnostic added during sha256_inline investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatteoMer merged commit 016774a into main Apr 8, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt)#75

feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt)#75
MatteoMer merged 21 commits into
mainfrom
worktree-jolt-inline

MatteoMer commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MatteoMer commented Apr 8, 2026

Summary

What changes for end users

The headline correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual

New infrastructure: examples/sha2-inline-guests/

jolt-bench patches

Memory leak fixes (all uncovered while running the SHA regression set)

Other small fixes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New infrastructure: `examples/sha2-inline-guests/`

`jolt-bench` patches