Skip to content

feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt)#75

Merged
MatteoMer merged 21 commits into
mainfrom
worktree-jolt-inline
Apr 8, 2026
Merged

feat: jolt-inline SHA-256 end-to-end (all SHA examples use inlines, verify under Zolt + Jolt)#75
MatteoMer merged 21 commits into
mainfrom
worktree-jolt-inline

Conversation

@MatteoMer

Copy link
Copy Markdown
Owner

Summary

Brings the Zolt prover to feature-parity with Jolt's jolt-inlines-sha2: all examples/sha256*.elf now use the inline SHA-256 compression opcode and verify end-to-end under both Zolt and Jolt's own prover (jolt-bench, rev 997c1543).

This PR is the culmination of the worktree-jolt-inline branch (21 commits) — from the initial SDK ELF loader plumbing in 3b611ef1 through the MLE/prefix fixes in the fix(lookup_table): ... series to the final Stage 5/6 correctness work and the fresh guest workspace.

What changes for end users

| Program | Trace Len │ Zolt (ms) │ Jolt (ms) │ Ratio │
|---------------|-----------|-----------|-----------|-------|
| sha256 | 8192 │ 932.03 │ 1267.33 │ 0.74x |
| sha256_128 | 16384 │ 1196.43 │ 1594.30 │ 0.75x |
| sha256_512 | 32768 │ 2075.16 │ 2190.78 │ 0.95x |
| sha256_1024 | 65536 │ 2611.47 │ 2818.48 │ 0.93x |
| sha256_2048 | 131072 │ 4336.91 │ 4555.72 │ 0.95x |
| sha256_inline | 4096 │ 705.16 │ 1099.65 │ 0.64x |
| TOTAL | │ 1185713526 │ 0.88x |

  • sha256_1024.elf went from 9550 ms (old software-SHA guest, trace_length 524288) to 2611 ms (new inline guest, trace_length 65536) — a 3.7× speedup just from switching to the inline expansion.
  • Zolt beats Jolt on every SHA row. Total across the SHA suite: -1669 ms (0.88×).
  • Every SHA ELF contains the custom-0 inline compression marker (verified via objdump).

The headline correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual

src/zkvm/spartan/stage6_helpers.zig::computeLookupIndex had no case for opcode 0x7B (our internal synthetic for VirtualRev8W, the byte-swap-per-32-bit-half lookup table used by the inline SHA-256 endian conversion). It fell through the default path and returned interleaveBits(0, 0) = 0. Meanwhile stage5_instances.zig::processTraceCycleCombined hand-codes 0x7B as identity-path returning rs1_value, matching Jolt's RISCVCycle<VirtualRev8W>::to_lookup_index (jolt-core/src/zkvm/instruction/virtual_rev8w.rs:46).

Result: the committed InstructionRa polynomials (built from computeLookupIndex) had lookup_index=0 for every VirtualRev8W cycle while Stage 5 opened them as rs1_value, so the Stage 6 LookupsRaVirtual sumcheck was proving consistency between two polynomials that disagreed by design. fib_sdk and sha256_1024 (software SHA) never emit 0x7B, so they stayed green while sha256_inline failed.

The bug was found by a per-round brute-force diagnostic that computed (1 − r_curr) · q_raw(0) + r_curr · q_raw(1) directly from the ra_polys at the very first active LookupsRaVirtual round and compared to the incoming claim:

  • For fib_sdk: matches at round 1 ✓
  • For sha256_inline: diverges at round 1 ✗

Since round 1 is before any sumcheck reduction, the divergence had to be in the inputs — not in the sumcheck algorithm. Diffing the two hand-rolled identity-path opcode sets (processTraceCycleCombined vs computeLookupIndex) immediately surfaced the missing 0x7B case. Fix is 9 lines in stage6_helpers.zig.

New infrastructure: examples/sha2-inline-guests/

A six-member Cargo workspace pinned to the same Jolt rev (997c1543) as jolt-bench, so cargo reuses the already-cached git checkouts. Each guest is a tiny #[jolt::provable] function that hashes a fixed-size zero buffer:

  • guest-64, guest-128, guest-512, guest-1024, guest-2048 — digest of [0u8; N]
  • guest-inline — digest of &[] (reproduces the smallest-possible inline trace, trace_length=4096)

The workspace ships its own linker.ld (ported from jolt-core/src/linker.ld.template with jolt build --mode no-std --backtrace off defaults baked in) and .cargo/config.toml matching the rustflags that zeroos-build::build_binary_with_rustflags applies. Guests build via direct cargo build --release --bin ... — no dependency on whatever jolt CLI is installed locally (which can drift from the pinned Jolt rev).

See examples/sha2-inline-guests/README.md for the full build instructions and the rationale for avoiding the jolt CLI.

jolt-bench patches

  1. jolt-inlines-sha2 dep with host feature. Its #[ctor::ctor] registers the SHA-256 sequence builder at startup. Without it, jolt-bench panics "No inline sequence builder registered for opcode=0x0b" on any inline-SHA ELF. An #[allow(unused_imports)] use jolt_inlines_sha2 as _ in main.rs forces the crate to be linked (otherwise rustc drops it).

  2. MemoryConfig realigned to macro defaults (heap 64 KiB, stack 64 KiB, io/advice 4 KiB each). The previous ad-hoc values (32 MiB/32 MiB/2 MB input/0 advice) only worked for software-SHA guests whose stack/io mismatches were hidden by the lack of MMU bounds checks. Any SDK-built guest has compile-time output_start hardcoded from its #[jolt::provable] attributes via MemoryLayout::new(), and needs the runtime config to match exactly or the tracer panics with "I/O overflow: Attempted to read from 0x7FFxxxxx".

Memory leak fixes (all uncovered while running the SHA regression set)

zig build's gpa leak tracker now reports leaks=0 across every example:

  • stage5_prover.ziglookups_ra_weights was allocated but never passed to LookupsReadRafProver.init (the prover has its own debug-only allocation inside). Dead allocation, removed.
  • stage7_prover.zig G-table parallel reducereduceFn(a, b) merged b into a and returned a, but never freed b's [][]F rows. Reworked LocalG from a bare [][]F into a struct carrying its own Allocator so the reduce function (which has no context parameter) can free merged partials.
  • proving_pipeline.zig — the inner jolt_prover::JoltProver(F) (converter) lazy-allocates _gpu_accel/_gpu_poly/_gpu_msm via enableGpu during the prove path but was never deinit'd. Added defer converter.deinit().

Other small fixes

  • common/jolt_device.zig::remapAddress now returns null for addresses below lowest_address instead of panicking with "Unexpected address". SDK guests can legitimately touch IO/padding addresses that lie below the advice region, and panicking there took down the whole prover.
  • The isPowerOfTwo(n) or n == 0 assertion chain is reordered to n == 0 or isPowerOfTwo(n), because std.math.isPowerOfTwo(0) asserts int > 0 and would panic before the or short-circuits.

Test plan

  • zig build test — passes, no regressions
  • ./bench/run-bench.sh (default 14 programs) — all VERIFIED
  • ./bench/run-bench.sh sha256 sha256_128 sha256_512 sha256_1024 sha256_2048 sha256_inline — all VERIFIED, Zolt wins 6/6
  • All non-SHA baselines (fibonacci, fibonacci_sdk, bitwise, alloc_sdk, collatz) still verify with leaks=0
  • Every examples/sha256*.elf contains the jolt-inline custom-0 compression marker instructions (verified via riscv64-elf-objdump)
  • jolt-bench proves every new SHA ELF without panicking
  • jolt-verifier accepts every proof Zolt generates

🤖 Generated with Claude Code

MatteoMer and others added 21 commits April 7, 2026 00:55
… fix

This change adds infrastructure for Jolt SDK ELFs and the jolt-inlines/sha2
SHA256 precompile, plus the bug fix that makes SDK fibonacci verify against
Jolt's verifier.

Why
- SDK ELFs (compiled with jolt-sdk) introduced new instruction patterns
  (CSRRW/CSRRS/MRET in _start, custom 0x5B advice/host-IO opcodes,
  JAL/JALR with rd=x0 remapped to virtual register 40, FENCE) that the
  prior Zolt prover did not handle.
- Jolt-inline replaces crypto operations in guest programs with custom
  RISC-V opcodes (0x0B); the bytecode preprocessing, lookup tables, and
  trace builder all needed to recognize and expand these.
- The single most important bug for SDK ELF verification was a mismatch
  between the bytecode val_polys (Stage 4 RegistersRWC) and the trace's
  Rs1Ra polynomial: the trace step's `rs1_read` field was false for
  custom 0x5B opcodes (VirtualHostIO/Advice*), but the bytecode entry
  carried rs1=Some(0). The val_poly therefore added eq_table_4[0] for
  these entries while the Stage 4 sumcheck did not, producing diverging
  per-stage claims and Stage 6 sumcheck failure.

Fix
- src/tracer/mod.zig: in stepNormal, set `reads_rs1=true` for custom-2
  opcode 0x5B with funct3 != 0/5 (the SDK Virtual{HostIO,Advice*} family).
  funct3=0/5 are VirtualSRL/SRA which have their own step handlers that
  set rs1_read/rs2_read explicitly, so we leave them alone.
- src/zkvm/proving_pipeline.zig: pass device.memory_layout.termination
  through to the prover-internal BytecodePreprocessing so the synthetic
  termination LUI/ADDI/SD entries match the exporter's. Defensive — does
  not by itself cause Stage 6 failure on the test workloads but keeps
  the two preps byte-for-byte identical.

Other infrastructure (used by jolt-inline and SDK)
- Decoder: handle opcode 0x0B (custom-0 jolt-inline), 0x5B (SDK custom-2
  VirtualHostIO/AdviceLoad/AdviceLen), 0x6B (custom-3 VirtualROTRI/W),
  CSRRW/CSRRS/MRET decoding, and JAL/JALR rd=x0 remapping to vr40.
- bytecode_preprocessing.zig: decompose CSRRW/CSRRS/MRET into
  ADDI/OR/JALR sequences matching Jolt's tracer's inline_sequence
  exactly (vr34..vr39 CSR registers, vr40 temp), and synthetic
  termination JAL with rd=vr40.
- Lookup tables: implement VirtualROTR (idx 27) and VirtualROTRW (idx 28)
  MLE evaluations and materialize entries.
- Lookups + lookup_trace: add Andn / VirtualROTRI / VirtualROTRIW types
  and recorders.
- spartan/bytecode_entries.zig: add ANDN / VirtualROTRI / VirtualROTRIW
  entry population, FENCE/ECALL/CSR/MRET handling in
  populateEntryFromJoltInstruction, JAL rd=0→vr40 remapping in entry
  population, IsLastInSequence post-processing for JALR vsr=0.
- spartan/stage5_*: add 0x6B and 0x5B handling in opcode dispatch.
- r1cs/constraints.zig + trace_witness.zig: 0x5B funct3 guards on the
  compact integer R1CS path, 0x6B/0x0B/0x2B handling.
- tracer/sha256_inline.zig: NEW — SHA256 sequence builder ported from
  jolt-inlines/sha2 (~500 lines), expands a 0x0B SHA256 instruction
  into the ~550 virtual instructions Jolt produces.
- jolt-verifier: pin to a Jolt revision that matches the proof format.

Status
- Plain C ELFs (fibonacci, sha256, sha256_128, sha256_2048, collatz,
  primes_large) verify against Jolt's verifier — unchanged.
- examples/fibonacci_sdk.elf VERIFIES against Jolt's verifier — was
  previously failing at Stage 6 with SumcheckVerificationError.
- examples/sha256_inline.elf still panics in the prover at
  evaluators.zig:1013 (constraint violation in Az/Bz product). This is a
  pre-existing issue inside the inline SHA256 sequence; tracked as a
  follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lRev8W

The R1CS witness's `next_is_virtual` and `is_virtual_opcode` checks treated
opcode 0x5B with funct3=0 or funct3=5 as always-virtual. In upstream Jolt's
custom-2 opcode space, those funct3 values map to first-class ELF instructions
(VirtualRev8W and AdviceLW respectively), not virtual sequences. The bogus
"virtual" classification fired R1CS constraint 18
(MustStartSequenceFromBeginning) on every transition from a real ELF
instruction into one of these custom-2 instructions, and the prover hit the
constraint-satisfaction assertion in interpolateAzBzProductInt while building
the witness for sha256_inline.elf.

Fix:
- src/zkvm/r1cs/trace_witness.zig: drop the 0x5B-funct3=0/5 branches in both
  `is_virtual_opcode` and `next_is_virtual`. The vsr-based check
  (`virtual_sequence_remaining > 0` or `is_last_in_sequence`) already catches
  internal expansions like our SHA256-inline VirtualSRLI sub-steps.

Also add the start of VirtualRev8W support (the byte-swap-each-32-bit-half
helper used by jolt-inlines/sha2's swap_bytes()):
- jolt_instruction.zig + decoder: VirtualRev8W variant decoded from 0x5B
  funct3=0; AdviceLB/H/W/D variants for funct3=3..6; VirtualAdviceLoad as
  the inline-sequence target.
- spartan/bytecode_entries.zig: populateEntryFromJoltInstruction handler for
  VirtualRev8W (lookup table 24, AddOperands+WriteLookupOutputToRD,
  LeftOperandIsRs1Value); SDK NoOp-like handler covers AdviceLB/H/W/D.
- instruction/mod.zig: LookupTables.VirtualRev8W variant + materializeEntry.
- instruction/lookups.zig: VirtualRev8WLookup type.
- instruction/lookup_trace.zig: recordVirtualRev8W.
- lookup_table/mod.zig: VirtualRev8W table struct + evaluateMLE; wired into
  evaluateTableMLE at index 24 (matches Jolt 997c1543's enum ordering).
- tracer/mod.zig: stepVirtualRev8W single-cycle handler that computes
  rd = byte_swap_per_half(rs1) and emits the trace step.

Status:
- All 6 previously-passing ELFs (fibonacci, sha256, collatz, primes_large,
  sha256_128, fibonacci_sdk) still verify.
- examples/sha256_inline.elf no longer panics in the prover; the prover now
  generates a complete proof. Verification still fails at Stage 2 because
  AdviceLW/AdviceLD inline-sequence expansion (VirtualAdviceLoad + SLLI +
  SRAI) is not yet implemented in the bytecode preprocessor or emulator.
  Tracked as a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AdviceL{B,H,W,D} instructions (Jolt SDK custom-2 opcode 0x5B funct3=3..6) are
no longer treated as NoOps in the prover. They now expand to the same virtual
sub-sequence Jolt's tracer produces via inline_sequence():

  AdviceLB/LH/LW (64-bit mode):  VirtualAdvice → VirtualMULI (SLLI) → VirtualSRAI (SRAI)
  AdviceLD:                      VirtualAdvice (single cycle, already 64-bit)

- src/tracer/mod.zig:
  - Emulator gains an `advice_pos` cursor and `adviceTapeRead(num_bytes)` helper
    that consumes bytes from `device.untrusted_advice`, returning 0 once the tape
    is exhausted (valid witness — advice is externally supplied and only range-
    checked by Jolt's proof system).
  - New `stepAdviceLoadSignExt` emits the 3-cycle (or 1-cycle for LD) sequence
    with correct vsr=2,1,0 and is_first=true,false,false flags, matching Jolt's
    finalize() layout. Sign-extension is performed by the shift pair for the
    sub-word loads.
  - `step()` dispatches 0x5B funct3=3..6 to `stepAdviceLoadSignExt`.

- src/zkvm/bytecode_preprocessing.zig: decode-time expansion of .AdviceLB/LH/LW
  into three JoltInstruction entries (VirtualAdvice, VirtualMULI, VirtualSRAI)
  and .AdviceLD into one (VirtualAdvice), so the exported bytecode matches the
  trace's virtual sub-sequence exactly.

- src/zkvm/proving_pipeline.zig: update `computeBytecodeCodeSizeWithTextSize`
  to count 3 entries for AdviceLB/LH/LW and 1 for the other custom-2 variants,
  so `bytecode_K` stays in sync with the expanded bytecode array.

Status:
- All 6 previously-passing ELFs still verify (fibonacci, sha256, collatz,
  primes_large, sha256_128, fibonacci_sdk).
- examples/sha256_inline.elf still fails verification at Stage 2, but the
  prover no longer panics. The remaining Stage 2 failure is in the combined
  InstructionLookupsClaimReduction / SpartanProductVirtualization / Ram* batch
  and needs further investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Our internal VirtualSRLI (emitted by SRLI decomposition) uses opcode 0x5B
funct3=0 as its synthetic trace-step encoding, which collides with Jolt's
VirtualRev8W instruction that uses the same encoding at the ELF level.

Make stepVirtualRev8W emit a synthetic trace instruction word with opcode
0x7B funct3=0 instead, so Stage 5's getLookupTableIndex and the is_identity_path
classifier can distinguish it from VirtualSRLI without any bytecode-walking
or opcode-overlapping tricks. The serialized bytecode JSON still uses the
variant tag .VirtualRev8W so Jolt's verifier interprets it correctly.

- src/tracer/mod.zig: stepVirtualRev8W builds a synthetic 0x7B funct3=0
  instruction word with rd/rs1 set, rs2/imm=0.
- src/zkvm/spartan/bytecode_entries.zig:
  - populateEntryFromJoltInstruction .VirtualRev8W sets entry.opcode=0x7B
  - getLookupTableIndex(0x7B, ...) → 24 (VirtualRev8WTable)
  - isKnownInstruction recognizes 0x7B
- src/zkvm/spartan/stage5_instances.zig: getLookupTableIndex(0x7B) → 24,
  is_identity_path(0x7B) → true, identity-path lookup index = rs1.
- src/zkvm/r1cs/trace_witness.zig: extractOperandFlags(0x7B) sets
  LeftOperandIsRs1Value + AddOperands + WriteLookupOutputToRD.
  decodeImmediateInt(0x7B) returns 0. computeU128LookupOperandInt(0x7B)
  returns rs1_value. readsRs1 includes 0x7B.

All 6 previously-verifying ELFs still pass. sha256_inline still fails at
Stage 2 — that failure is pre-existing and unrelated to this change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dFlags

The compact R1CS witness path's extractOperandFlags had no case for opcode
0x6B (custom-3, VirtualROTRI/VirtualROTRIW), leaving the per-cycle flags all
false. This silently diverged from the bytecode entry's circuit/instruction
flags for any cycle that executed a VirtualROTRI/W virtual instruction — this
primarily matters for jolt-inline SHA256 which emits hundreds of VirtualROTRI
sub-steps via its inline expansion.

- src/zkvm/r1cs/trace_witness.zig:
  - extractOperandFlags handles 0x6B with LeftOperandIsRs1Value +
    RightOperandIsImm + WriteLookupOutputToRD (interleaved path, matching
    VirtualROTRILookup.instructionFlags()).
  - decodeImmediateInt reconstructs the bitmask from the rotation amount
    stored in the I-type imm field, so Stage 1 val_poly sees the same bitmask
    value the bytecode entry carries (populateVirtualROTRI stores the full
    64-bit bitmask in entry.imm).

All 6 previously-verifying ELFs still pass. sha256_inline still fails at
Stage 2; the remaining failure is not in the 0x6B path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the ZOLT_DUMP_TRACE diagnostic added during sha256_inline investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Our stepInline handler for the jolt-inline SHA256 expansion set
rd_written = (!is_store and rd != 0), excluding rd=0 from RdWa tracking.
But Jolt's convention (documented in stepNormal) is that *every* instruction
with an rd field writes to rd, even when rd=0 — Jolt captures cpu.x[0]
before/after execution and treats the (0→0) transition as a write. Excluding
rd=0 causes the Stage 4 Rs1Ra/RdWa sumcheck claims for the SHA256 sub-cycles
to diverge from what Jolt's verifier reconstructs from the serialized bytecode
entries.

- src/tracer/mod.zig: stepInline trace step now sets rd_written=!is_store,
  matching stepNormal.

All 6 previously-verifying ELFs still pass. sha256_inline still fails at
Stage 2 (the 5-way batched sumcheck); the remaining failure is not in the
rd_written path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent fixes that bring sha256_inline.elf much closer to
verifying against Jolt's verifier (Stage 6 BCRAF per-stage claims now
all match between prover and bytecode-derived val polys):

1. constraints.zig: hasLookupTable now returns true for opcode 0x7B
   (synthetic VirtualRev8W). Without this the trace witness silently
   skipped flag computation for VirtualRev8W cycles, leaving cf[0]
   AddOperands and cf[6] WriteLookupOutputToRD as zero while the
   bytecode entry had them set.

2. stage5_instances.zig: add 0x7B to left_is_rs1 / lookup-operand
   computation, mirroring VirtualZeroExtendWord (0x42), and split
   getLookupTableIndex's funct3==7 case so ANDN (funct7=0x20) routes
   to table 3 (AndnTable) instead of collapsing into table 2 (AND).
   The two were aliased before, causing LookupTableFlag[2,3] sums to
   diverge between the trace polynomial and the bytecode val polys.

3. bytecode_preprocessing.zig: in the SHA256 inline expansion path,
   convert the shamt stored in InlineInstr.imm into the multiplier
   (1 << shamt) before populating the VirtualMULI bytecode entry,
   matching Jolt's VirtualMULI::operands.imm convention and the
   standalone SLLI->VirtualMULI handler.

Stage 2 verification still fails for sha256_inline (RWC and
OutputSumcheck instances disagree), but ProductVirt/InstrLookups/RAF
plus all five Stage 6 BCRAF stages now match.
The SHA256 inline expansion in stepInline was calling self.ram.read /
self.ram.write directly, which bypassed the I/O-region check.  When the
host program passes an output-region pointer (e.g. 0x7fffb000) as the
state pointer to the SHA256 inline opcode, the resulting hash bytes were
written to plain RAM instead of being captured into device.outputs.
That left val_final[output_index] != val_io[output_index] when the
verifier reconstructs the IO polynomial via eval_io_mle, which is what
breaks Stage 2's RamReadWriteChecking and OutputSumcheck for
sha256_inline.

Switch the inline SD path to writeWordWithIO and the inline LD path to
readWordWithIO so that I/O-region accesses follow the same code path as
regular SDs/LDs.
The SB/SH/SW decomposition in stepSubWordStore was bypassing the IO
check by calling self.ram.write directly for the final SD step. That
left device.outputs empty for any program that writes its result via
sub-word stores (which is what the Rust SDK does for typical small
return values), so:

1. The trace's val_final and the verifier's val_io disagreed at the
   output region for any program with non-zero outputs (which is why
   sha256_inline failed Stage 2's OutputSumcheck and RamRafEvaluation).
2. The Fiat-Shamir preamble couldn't see the actual program output.

Fixes:
- writeWordWithIO now routes writes to BOTH device.outputs (for the
  preamble + val_io) AND ram.write (so val_final and subsequent
  read-modify-write LD steps observe the same byte values).
- stepSubWordStore's SB/SH and SW SD-finalization steps now go through
  writeWordWithIO instead of self.ram.write directly.
- proveJoltCompatibleWithDoryAndSrsAtAddress trims trailing zero bytes
  off the outputs slice before passing it to the JoltDevice. Jolt's
  RV64IMACVerifier::new applies the same truncation
  (program_io.outputs.truncate(...)) before running its preamble; if
  the prover doesn't trim too, the prover/verifier transcripts diverge
  for any program whose output ends in zero bytes (e.g. fib_sdk
  computing fib(0)=0, which used to "verify" only because the
  pre-fix outputs were always empty).
- JoltProofWithDory now carries program_inputs / program_outputs /
  program_panic so the prove command can write a `<proof>.io`
  ark-serialize sidecar containing the JoltDevice next to the proof.
- jolt-verifier reads `<proof>.io` (when present) and uses the
  deserialized JoltDevice as program_io for verification, replacing
  the previous "always empty default" behaviour. Falls back to the
  old default when no sidecar exists, preserving compatibility.

After this commit:
- All previously verifying ELFs (fibonacci, fibonacci_sdk, alloc_sdk,
  bitwise, collatz, factorial, gcd, primes, signed, sum, sha256_128,
  sha256_1024, primes_large) still verify.
- sha256_inline.elf now passes Stage 2's RAF, Output, Product, and
  InstructionLookups instances and the Stage 6 BCRAF claims; the only
  remaining mismatch is Stage 2 RamReadWriteChecking (inst[0]). That
  is the next thing to chase.
In the inline instruction sequence handler for SD, memory_pre_value
was read from ram.memory AFTER writeWordWithIO had already stored
the new doubleword, so the TraceStep recorded the POST value instead
of the pre-value. This corrupted the Stage 6 RamInc polynomial
(ram_inc = memory_value - memory_pre_value = post - post = 0) and
caused Stage 2 RamReadWriteChecking verification to fail for
sha256_inline. Capture the pre-value before the write.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…EN inputs

The verifier-style expected_output_claim path in InstructionReadRaf calls
LookupTable.evaluateTableMLE for every table flag opening. UpperWord (table
13) was a panic, which made it impossible to drive the prover-side
expected-claim diagnostic for any program that uses MULHU.

Pow2 (table 21) is a single-operand identity-path table whose existing
struct method asserts r.len == XLEN, but Jolt's evaluate_mle takes the full
2 * XLEN interleaved opening point. Add a length-aware dispatch that runs
Jolt's product formula when r.len == 2*XLEN and falls back to the legacy
single-operand variant otherwise.

Also raise @setEvalBranchQuota in the few MLEs whose `inline for` over XLEN
exceeds the comptime branch limit when called from runtime-dispatched paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VirtualSRA.evaluateMLE was using r[2*(XLEN-1)] as the sign bit and folding
the sign-extension contribution inside the iteration loop. Jolt's
evaluate_mle (jolt-core/src/zkvm/lookup_table/virtual_sra.rs) uses r[0]
as the sign bit and accumulates a separate sign_extension term weighted
by 2^i (skipping i == 0). Rewrite to match Jolt exactly.

ShiftRightBitmask.evaluateMLE assumed a single-operand `r.len == XLEN`
input and brute-forced the sum, while Jolt's evaluate_mle expects
`r.len == 2 * XLEN` and reads the LAST log2(XLEN) elements as the
encoded shift amount. Add a 2*XLEN path that mirrors Jolt and keep the
legacy path for the 8-bit test harness.

These functions are currently only reachable via the verifier-style
expected-claim diagnostic path (evaluateTableMLE), but were silently
producing wrong values any time the diagnostic was run on a program
whose Stage 5 cycles exercised these tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The local `rev8w` helper in src/zkvm/lookup_table/prefixes.zig only
processed the lower 32 bits of its u64 input — it masked off the upper
half before byte-swapping. Jolt's tracer::instruction::virtual_rev8w::rev8w
reverses bytes within EACH 32-bit half independently and concatenates them
back into a u64. This helper is consumed by the Rev8W prefix MLE and its
checkpoint update, so any prefix evaluation involving bits in the upper
half of the lookup index produced a value different from Jolt's, causing
Stage 5 InstructionReadRaf to disagree with the verifier for any program
that exercises VirtualRev8WTable (table 24). sha256_inline triggers this
because the inline SHA256 expansion emits VirtualRev8W cycles for endian
conversion.

Match Jolt's full-u64 implementation:
  (v as u32).swap_bytes() as u64 + (((v >> 32) as u32).swap_bytes() as u64) << 32

Verified via the existing REMAT_VS_MLE diagnostic: with this fix,
stored_table_values[24] computed via prefix/suffix decomposition now
agrees with the direct MLE evaluation at r_address for both fib_sdk
(unused → was canceling) and sha256_inline (used → was bug).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…MLEs

Jolt's `rightShiftWPrefixMle` and `leftShiftWHelperPrefixMle` use Rust's
`F::from_u32(1 << y.leading_ones())` and `F::from_u32(u32::from(x) >> y.trailing_zeros())`.
In release mode, `u32` shift counts are masked mod 32, so `1u32 << 32` wraps
to `1u32 << 0 = 1` (NOT 0) and `x >> 32` wraps to `x >> 0 = x`.

Zolt previously had conservative `if (shift >= 64) return result` / `else 0`
branches which zero out the result at those boundaries — diverging from Jolt's
wrapping behavior when `y.leading_ones() == 32` (y is all-ones in its bit
width) or `y.trailing_zeros() == 32` (y == 0).

This alignment doesn't fix sha256_inline's Stage 5 failure on its own, but it
removes a known-divergence from the Jolt reference that could quietly corrupt
prefix MLE evaluations for any trace that happens to hit these edge cases.

All existing regressions (fibonacci_sdk, fibonacci, bitwise, alloc_sdk,
collatz, sha256_1024) continue to verify.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gated behind ZOLT_S5_DEBUG=1 env var, dumps at the end of Stage 5 lookups:
  - current_claim (tracked through sumcheck round Horner updates)
  - combined_vals[0], lookups_current_scalar, E_in[0], E_out[0], ra_product
  - self_check = scalar * E_in[0] * E_out[0] * combined_vals[0] * Π ra_chunks

If the polynomial state is internally consistent with the round polynomial
chain, self_check must equal current_claim. For sha256_inline this invariant
currently fails (self_check_eq_current=false), pinpointing a per-round
computation bug somewhere in proverMsgReadChecking / proverMsgRaf — the next
step for the Stage 5 InstructionReadRaf investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ord, VirtualROTRW

Adds Zig ports of Jolt's `prefix_suffix_test::<XLEN, F, T>()` (jolt-core
test.rs:49) for the three tables that sha256_inline uses and that fib_sdk /
sha256_1024 do not — these were the prime suspects for the Stage 5
InstructionReadRaf drift.

The test walks 50 random lookup indices through all 8 × 16 = 128 address
rounds, binding prefix bits incrementally, and at each round compares
  combine(prefix_evals, suffix_evals) against the table MLE evaluated
at the reconstructed 128-element field point. A mismatch would pinpoint a
bug in either the prefix MLE, the suffix MLE, the combine formula, or the
prefix checkpoint updates.

Results:
- Andn (table 3):            PASS
- LowerHalfWord (table 19):  PASS
- VirtualROTRW (table 28):   PASS (with bitmask-format index generator
    matching Jolt's `gen_bitmask_lookup_index`; VirtualROTRW's MLE is only
    well-defined when the right operand y is a contiguous-leading-ones
    bitmask, so the generator restricts to that subset.)

This narrows the sha256_inline Stage 5 drift to components OTHER than
these three tables' address-round prefix-suffix decomposition. Remaining
suspects: RAF (left/right/identity prefix MLEs), rematerialization at
round 128, cycle-round polynomial computation, or a prefix checkpoint
update for a table that is mid-chain.

Also fixes a stale test in src/tracer/sha256_inline.zig that referenced
InstrKind.SRLI (removed in favor of VirtualSRLI) — this was blocking
`zig build test` entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ndHalfWord

Adds the SignExtendHalfWord (table 20) prefix-suffix decomposition test
alongside the Andn / LowerHalfWord / VirtualROTRW tests. sha256_inline
hits this table via its VirtualSignExtendWord (0x0B) cycles inside the
SRLIW decomposition. Test passes — ruling out this table as the source
of the Stage 5 drift.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs the full address-rounds sumcheck at small scale (total_len=8, chunk_len=2,
4 phases × 2 rounds) for a single random cycle and verifies that:
  left_ps.bound_value   == OperandPolynomial::Left::evaluate(r_challenges)
  right_ps.bound_value  == OperandPolynomial::Right::evaluate(r_challenges)
  identity_ps.bound_value == IdentityPolynomial::evaluate(r_challenges)

Covers 30 random trials. Passes, confirming that RafDecomposition's
prefix_mle initialization + bind correctly tracks the three RAF polynomials
through a complete phased sumcheck.

This is weaker than the per-round test (it only checks the final bound
value, not intermediate round polynomials), but it's a strong sanity check
that the prefix cached-table approach is correct. Sha256_inline's Stage 5
drift must live in the round-polynomial computation (proverMsgRaf), the
multi-cycle Q-poly aggregation, cycle rounds, or rematerialization — not
in the single-cycle RAF bound-value chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r sha256_inline

The SHA-256 inline expansion folds K[i]/BLOCK[i] constants and emits ADDI
with the full 32-bit constant in the immediate field of `InlineInstr.imm`.
When stepInline encoded these as RISC-V instruction words, the 12-bit ADDI
imm field truncated the constant to its lower 12 bits sign-extended, so
the trace's `instruction` field could no longer reproduce the actual rd
value (rd = rs1 + full_imm).

Stage 5's InstructionReadRaf brute force exposed the symptom: for sha256_inline
the prover's `materializeTableEntry(table, lookup_idx)` (computed from the
truncated 12-bit imm) disagreed with `step.rd_value` (computed from the full
imm) on 64 cycles — exactly the K[i] folded constants. fib_sdk and the other
non-inline baselines have 0 such mismatches.

Schema-level fix:
- Add `inline_full_imm: u64 = 0` and `has_full_imm: bool = false` to TraceStep.
  stepInline sets both for ADDI/XORI/ANDI emitted from sha256_inline.

Imm-decoding paths updated to honour the override:
- `decodeImmediateInt` (trace_witness.zig integer path) — used by the raw
  R1CS witness Imm and RightInstructionInput fields.
- `computeU128LookupOperandInt` (trace_witness.zig) — feeds Stage 5
  RightLookupOperand for identity-path AddOperands.
- `processTraceCycleCombined` in stage5_instances.zig — Stage 5
  `lookup_indices_u128`, `right_op`, and the `imm_val` used for
  RightInstructionInput.
- `computeLookupIndex` in stage6_helpers.zig — used by the InstructionRa
  one-hot polynomial commitment (proving_pipeline.zig:811), Stage 6
  LookupsRaVirtual prover, Stage 6 Booleanity prover, and Stage 7.
- `populateEntryFromJoltInstruction` in bytecode_entries.zig — restores
  the full u64 imm on the bytecode entry after `populateEntryFromInstruction`
  decodes only the truncated 12 bits from the synthetic instruction word.

Independent off-by-one fix surfaced by the brute force:
- `LookupTable.materializeTableEntry` had odd/even bit positions swapped
  AND had stale table indices (14, 19-28, 30 were shifted relative to
  `getLookupTableIndex`/`evaluateTableMLE`). Rewrote both bit extraction
  to match Jolt's `uninterleave_bits` (LEFT = ODD bits, RIGHT = EVEN bits)
  and the table dispatch to match the production indices. Function is
  diagnostic-only — bug was dormant.

Diagnostics added (env-gated, off in production):
- `ZOLT_S5_BRUTE` (stage5_lookups.zig + stage5_prover.zig): brute-force
  reconstruction of the InstructionReadRaf round-0 polynomial from cycle
  data with per-table breakdown, plus per-cycle materializeTableEntry
  vs trace.rd_value comparison.
- `ZOLT_S6_DEBUG` (stage6_prover.zig): per-instance Stage 6 input claims
  and per-cycle bytecode-entry-imm vs trace-step-inline_full_imm sanity check.

Dead diagnostic code removed in streaming_outer.zig and stage3_prover.zig
(referenced no-longer-existing struct fields, blocking compilation when
debug_verbose=true).

Status:
- sha256_inline: Stage 5 InstructionReadRaf now passes (eval_0/eval_1/claim
  all match brute force, 0 PERCYCLE mismatches across 4096 cycles). Stage 6
  batched sumcheck still fails — separate root cause that only became
  reachable after the Stage 5 fix.
- All baselines verify: fibonacci_sdk, fibonacci, bitwise, alloc_sdk,
  collatz, sha256_1024.
- `zig build test` passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All `examples/sha256*.elf` are now Jolt SDK guest binaries that call
`jolt_inlines_sha2::Sha256::digest` and verify end-to-end under both
Zolt and Jolt's native prover (`jolt-bench`, rev 997c1543).

## Core correctness fix: VirtualRev8W in Stage 6 LookupsRaVirtual

`stage6_helpers.zig::computeLookupIndex` had no case for opcode 0x7B
(our internal synthetic for `VirtualRev8W`, used by the inline SHA-256
expansion for endian conversion). It fell through the default path,
returning `interleaveBits(0, 0) = 0`. Meanwhile `stage5_instances.zig`
hand-codes 0x7B as identity-path returning `rs1_value`, matching Jolt's
`RISCVCycle<VirtualRev8W>::to_lookup_index`. Result: the committed
`InstructionRa` polynomials had lookup_index=0 for every VirtualRev8W
cycle while Stage 5 opened them as `rs1_value`, so the Stage 6
LookupsRaVirtual sumcheck was reducing two inconsistent polynomials.
sha256_inline hit this at round 1 of the LookupsRaVirtual instance;
fib_sdk/sha256_1024 never emitted 0x7B so they were unaffected.

Found by a per-round brute-force diagnostic that computed
`(1-r_curr)*q_raw(0) + r_curr*q_raw(1)` directly from the ra_polys at
the first active round and compared to the incoming claim. For
fib_sdk they matched, for sha256_inline they diverged at round 1 ─
before any sumcheck reduction, pointing at the inputs. Diffing the
two hand-rolled identity-path opcode sets surfaced the missing 0x7B.

## New inline SHA-256 guest workspace

`examples/sha2-inline-guests/` is a six-member Cargo workspace
(`guest-64/128/512/1024/2048/inline`) pinned to the same Jolt rev as
`jolt-bench`, so cargo reuses the already-cached git checkouts. Each
guest is a tiny `#[jolt::provable]` function that hashes a fixed-size
zero buffer. The workspace ships its own `linker.ld` (ported from
jolt-core's `linker.ld.template`) and `.cargo/config.toml` matching
the rustflags `zeroos-build` applies when `jolt build --mode no-std`
runs, so we build guests via direct `cargo build --release --bin ...`
without depending on whatever `jolt` CLI is installed locally.

The built ELFs replace the previously-committed software-SHA binaries.
`sha256_1024.elf` went from 9550 ms (software SHA of 1024 bytes,
trace_length 524288) to 2611 ms (inline SHA, trace_length 65536) ─
a 3.7× speedup from the inline expansion alone.

## jolt-bench patches

`jolt-inlines-sha2` is now a dependency with the `host` feature ─
its `#[ctor::ctor]` registers the SHA-256 sequence builder at startup,
without which the tracer panics "No inline sequence builder
registered for opcode=0x0b". An `#[allow(unused_imports)] use
jolt_inlines_sha2 as _` in main.rs forces the crate to be linked.

`MemoryConfig` now uses the jolt-sdk macro's default sizes (heap/stack
64 KiB, io/advice 4 KiB each). The historical 32 MiB/32 MiB/2 MB
values only worked for software-SHA guests whose stack/io mismatches
were hidden by the lack of MMU bounds checks; any SDK-built guest has
compile-time `output_start` hardcoded from its `#[jolt::provable]`
attributes and needs the runtime config to match.

## Memory leak fixes

Three unrelated leaks uncovered by `zig build`'s gpa leak tracker
while running the new regression set:

- `stage5_prover.zig` allocated `lookups_ra_weights` but never passed
  it to `LookupsReadRafProver.init` — dead allocation.
- `stage7_prover.zig`'s parallel G-table reduce merged right into
  left and returned left, leaking right's `[][]F` rows every time.
  Reworked `LocalG` as a struct carrying its own `Allocator` so the
  reduce function (which has no context parameter) can free merged
  partials.
- `proving_pipeline.zig` never called `deinit()` on the inner
  `jolt_prover::JoltProver(F)` that lazy-allocates
  `_gpu_accel`/`_gpu_poly`/`_gpu_msm` in `enableGpu`. Added a
  matching `defer converter.deinit()`.

## Other small fixes

- `common/jolt_device.zig::remapAddress` now returns `null` for
  addresses below `lowest_address` instead of panicking, so SDK
  guests whose traces touch IO/padding addresses don't crash the
  prover.
- `isPowerOfTwo(0)` is asserted by `std.math`, so the advice-size
  zero check needs to come first in the `or` chain (Zig short-
  circuits). Reordered.

## Verification

Full regression under `zig build test` + `./bench/run-bench.sh`:

  Program       │  Cycles │ Trace Len │  Zolt (ms) │  Jolt (ms) │   Ratio
  ──────────────┼─────────┼───────────┼────────────┼────────────┼────────
  sha256        │    6647 │      8192 │     932.03 │    1267.33 │   0.74x
  sha256_128    │    9375 │     16384 │    1196.43 │    1594.30 │   0.75x
  sha256_512    │   27412 │     32768 │    2075.16 │    2190.78 │   0.95x
  sha256_1024   │   50852 │     65536 │    2611.47 │    2818.48 │   0.93x
  sha256_2048   │   97741 │    131072 │    4336.91 │    4555.72 │   0.95x
  sha256_inline │    3690 │      4096 │     705.16 │    1099.65 │   0.64x
  TOTAL                              │   11857.16 │   13526.26 │   0.88x

All non-SHA baselines (fibonacci, fibonacci_sdk, bitwise, alloc_sdk,
collatz) still verify with zero leaks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatteoMer MatteoMer merged commit 016774a into main Apr 8, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant