Spec: Streaming Prover by sashafrolov · Pull Request #1629 · a16z/jolt

sashafrolov · 2026-06-18T15:24:22Z

Draft spec for the streaming Jolt prover. See specs/jolt-streaming-prover.md.

sashafrolov · 2026-06-18T15:33:11Z

github-actions · 2026-06-18T17:14:28Z

Claude spec review session started: https://claude.ai/code/session_0113TJTBYW1ukcB76q2vvJbB

moodlezoup · 2026-06-18T17:19:16Z

Spec Analysis: Streaming Prover (spec @ 5a8a631)

Dimension	Score	Gap
Goal	0.92	Clear — crisp objective, Intent and Design agree
Constraints	0.82	Per-stage "strictly decreases" RAM gate vs. the out-of-scope RAM/IO term needs reconciling
Success Criteria	0.72	Byte-identity oracle (jolt-core vs. modular-streaming-off) underspecified
Spec–Codebase Consistency	0.95	All file/type/invariant claims verified accurate
Ambiguity		16%

Status: Questions remain — 1 blocking item to resolve before implementation.

I verified the spec's codebase claims against the PR branch (65ff5a8) and jolt-eval on main. Everything checks out: crates/jolt-prover/src/stages/stage0–stage8 (9 stages); cpu/schedule/mod.rs defining StreamingSchedule/HalfSplitSchedule/LinearOnlySchedule; cpu/sumcheck/{kernels,mod.rs} with no schedule consumer; StreamingSumcheckWindow/LinearSumcheckStage traits in jolt-core/src/subprotocols/streaming_sumcheck.rs; streaming.rs in jolt-dory/jolt-witness; LazyTraceIterator in tracer/src/lib.rs; jolt-prover/Cargo.toml's zk/field-inline forwarding pattern; and the referenced invariants (soundness, split_eq_bind, field_mul_scalar) and objectives (prover_time_sha2_chain_100, prover_time_fibonacci_100). No factual contradictions. The modular kernel layer indeed has no SumcheckInstanceProver/SumcheckInstanceParams traits, as claimed.

Questions:

1. [Blocking — Success Criteria] The streaming_proof_byte_identical invariant asserts whole-proof byte equality between the modular streaming prover and jolt-core. Two things make this oracle choice ambiguous as written:

I found no existing whole-proof byte-identity test between the modular prover and jolt-core. The established parity mechanism in jolt-prover-harness compares at component granularity — ComparisonTarget::{CoreCommitments, CoreStageOutput, CoreOpeningClaims, CoreProofShape} (src/parity.rs, plus the frontier_stage0..8 tests) — not serialized proof bytes. So it's unclear whether modular streaming-off is already byte-identical to jolt-core today.
Using jolt-core as the oracle couples two properties: modular-vs-core parity and streaming-on-vs-off correctness. The most direct oracle that isolates the streaming change is the modular prover with streaming off (same crate stack, same serialization), which the streaming flag's "computation unchanged" criterion already pins down.

Candidate answer — please confirm one: (a) modular streaming-off is already byte-identical to jolt-core and a whole-proof byte test passes today, so jolt-core is a sound oracle; or (b) the oracle should be modular streaming-off for the byte-identity assertion (isolating streaming), with jolt-core parity remaining the existing component-level check. If (a), it would help to name the existing test; if neither holds yet, the criterion may need to drop to component-parity (commitments/claims/shape) rather than whole-proof bytes. This determines exactly what the headline test asserts, so a wrong assumption means the new invariant fails for reasons unrelated to streaming.

Suggestions (non-blocking):

Per-stage RAM gate vs. out-of-scope memory term. The acceptance criterion and the Execution "promotion rule" hard-gate each converted stage on within-stage peak RAM that strictly decreases, but Non-Goals exclude "the prover-memory term that is linear in the zkVM's addressable RAM or I/O." A stage whose within-stage peak is dominated by that out-of-scope term may not strictly decrease from streaming the trace alone. Default assumption if unaddressed: the gate is measured against the trace-derived within-stage peak, and for a stage where the non-trace term dominates, "does not increase" counts as passing. Consider stating this explicitly so a legitimately-converted stage isn't blocked (or a regression masked) by the measurement boundary.
The modular stack is branch-local. jolt-prover, jolt-witness, jolt-backends, and jolt-prover-harness exist only on feat/streaming-jolt (+~208k lines), not on main. This is consistent with the spec, but worth noting that /implement-spec must run on this branch — a fresh main checkout has none of the target crates. Default: implementation proceeds on feat/streaming-jolt.
Harness already partly exists. Execution step 1 says "Stand up the per-stage gate harness (the jolt-prover-harness)", but a substantial harness is already present (parity.rs, core_fixture.rs, frontier_stage0..8 tests, metrics.rs). Default: extend the existing harness (add the byte-identity/shape comparison and per-stage peak-RAM logging on top) rather than build a new one. A one-line nod to the existing infra would prevent duplication.

After updating the spec, re-add the claude-spec-review-request label to re-run this analysis.

Generated by Claude Code

sashafrolov · 2026-06-18T22:26:29Z

I fixed the comments (my local claude didn't say these issues), can't modify the tags on the PR myself.

markosg04

Good work! Most of my comments are just me trying to provide useful context and hopefully predict any potential footguns with your approach

markosg04 · 2026-06-19T17:28:36Z

+Space](https://eprint.iacr.org/2025/611): switch every access to the trace
+vector to a lazily generated, streaming view so peak prover RAM scales with
+`√T` instead of `T`, bounding memory to a few GB for arbitrarily long
+executions and making Jolt usable on laptops and other memory-constrained
+machines. The feature is gated behind a new compile-time `streaming` Cargo


I suppose we want the data structure for a particular sum-check / prover component to be sqrt(T) or streaming view of the data

markosg04 · 2026-06-19T17:30:32Z

+3. **Run a whole-prover optimization pass after all 9 stages are converted**,
+   since per-stage 2× guards can compound — optimize until end-to-end prover
+   time is within ~2× of the non-streaming baseline.
+


You may wnat to consider adding additional mechanical checks (https://github.com/semgrep/semgrep) to make sure agents don't accidentally modify the verifier or other unwanted things during a slice. Codex/claude are very trained on following a loop of implement, check, find the next step and repeat. you might want to consider commit instructions too per step

markosg04 · 2026-06-19T19:38:25Z

+Route every prover access to the execution trace through a lazily generated,
+parallel-consumable streaming view — gated behind a compile-time `streaming`
+Cargo feature — so that peak prover RAM scales with `√T` rather than `T` while
+the prover emits byte-identical proofs.


(just pointing this out again, you can probbaly in one shot update the whole spec) you may want to adjust this to be that the prover algorithm either is allocating sqrt space or a streaming view of a linear space datat structure

markosg04 · 2026-06-19T19:41:14Z

+- Two streaming-aware core algorithms underpin the feature: (1) a **streaming
+  Dory commitment** that computes the commitment's vector-matrix products over
+  trace chunks without materializing the polynomial. The streaming
+  commitment surface exists today
+  ([crates/jolt-dory/src/streaming.rs](crates/jolt-dory/src/streaming.rs)), but
+  the opening path is **not yet fully streaming** — part of it still
+  materializes a witness that should be consumed chunk-wise, which this spec
+  must eliminate;


Perhaps the PR with the 'modular' jolt-prover stack (which your work should build on) doens't have the dory commit and vector matrix product sqrt space optimization but jolt-core does (also see: #1632).

While I'm at it: let me clarify that we took jolt-core and 'stripped' it down to jolt-prover-legacy, so that we could finish the cutover of the modular stack and unify to the single jolt-verifier, whereas the other jolt-prover line of work is coming soon which introduces the decoupled model of the witness, prover orchestration, and backends/compute primitives.

markosg04 · 2026-06-19T19:42:25Z

+- A `streaming` Cargo feature on `jolt-prover` that forwards to the backend
+  capability crates (`jolt-backends`, `jolt-witness`, `jolt-dory`), selecting
+  the streaming code path at compile time via `#[cfg(feature = "streaming")]`
+  exactly as `zk`/`field-inline` forward today


We may want to consider nice generic compile time cfg for other backends too, since this streaming backend is the first 'alternative' to the canonnical cpu we are introducing (gpu backends to soon come), but this is not too important rn, just something to think about

markosg04 · 2026-06-19T19:47:23Z

+- **`tracer`** — owns the lazy-trace mechanism (`LazyTraceIterator`,
+  [tracer/src/lib.rs](tracer/src/lib.rs)). The streaming trace view is built
+  here; trace ownership stays in `tracer`. The first CPU run is serial, but
+  subsequent recompute passes run in parallel (rayon), so chunk access must be
+  parallelizable and chunked streaming must not serialize the prover's hot
+  loops.


so tracer should hold the actual program emulation implementation (hence the lazy iterator api) while the jolt-witness should provide some generic interface for the streaming witness, where we can implement such a generic api for tracer. Let's try and maintain this decoupled style / generic style so that it becomes simple to later swap the tracer without needing to worry about re-implementing streaming backend stuff

markosg04 · 2026-06-19T19:48:54Z

+  and witness providers. The prover↔witness interface must offer a streaming
+  oracle, not a fully materialized one, and must allow parallel access
+  to its witness chunks.
+- **`jolt-backends` (`cpu`)** — the CPU streaming compute and the bulk of the


yea in general, it's ok to separate out most of the streaming primitives as a completely different cpu backend (say, a streaming-backend) if that makes it easier to organize things.

markosg04 · 2026-06-19T19:49:34Z

+  polynomial types. The commitment surface is in place; the remaining work is
+  the opening path, which still materializes a witness that should instead be
+  consumed chunk-wise.
+- **`jolt-sumcheck`** — the shared sumcheck protocol/verifier-contract crate


again u may want to use semgrep or other skills / checks to make sure agents don't accidentally touch apis that are not needed to be modified in theory here such as jolt-sumcheck

markosg04 · 2026-06-19T19:51:37Z

+3. **Run a whole-prover optimization pass after all 9 stages are converted**,
+   since per-stage 2× guards can compound — optimize until end-to-end prover
+   time is within ~2× of the non-streaming baseline.


I probably would recommend adding a bit more detailed set of instructions or some skills for specifically how to go about an 'optimization step': like exactly how it should instrument it, over what data (say, sha2-chain 2^16 - 2^20 or something). Just make sure that the 'acceptance' rules are strict and hopefully avoid some reward hacking

And you might want to consider using some of thees skills for general development:
https://github.com/multica-ai/andrej-karpathy-skills

I've been using them lately and think they are helpful

markosg04 · 2026-06-19T19:54:43Z

+   within-stage peak RAM strictly decreases; confirm per-stage time regression
+   stays under 2× (an


You can use tools like allocative(https://crates.io/crates/allocative) (or roll your own solution with something like size_of) to mechanically check the prover's data structures, and/or standard instrumenting tools to test the RSS. Maybe just checking actual data structures involving T would be better due to unpredictable allocator behavior, though you may need to play around with it and see.

quangvdao · 2026-06-22T02:49:33Z

This spec seems very ambitious and there are a lot of under-specified details. From my past thinking about streaming Jolt, here are the following blockers to getting performance overhead to be reasonable (I still think <2x is way too optimistic):

Lazy trace generation means you'll regenerate trace a bunch of times throughout all stages (say 15-40 depending on how many streaming passes you take over each sum-check). The current tracer is just an interpreter and not very optimized, you will definitely feel the pain. You should consider writing a better tracer that compiles the ELF to a native binary (ARM or x86), execute it at native speed, and set up checkpoints from which one can re-expand quickly.
Spartan is one sum-check that is rendered streaming today. If you bench its performance, you'll see that it is >2x as soon as you start doing at least 4 streams over the data. There is some perf on the table for the streaming kernels, but just asymptotically you are adding a O(log log n) factor and so expecting at most 2x slowdown may be unrealistic
The sparse sum-checks are the hardest ones to stream for. You will just need to look into implementation strategies for those sum-checks currently then generalize that to the streaming setting. I remember encountering several difficult points that I was not able to resolve cleanly e.g. with minimal perf loss

sashafrolov · 2026-06-23T21:01:17Z

I updated the spec in response to all of your feedback. I have slightly downscoped this PR to just be an initial rough implementation with current algorithms, and then leaving the optimizations to a few subsequent rounds. In this implementation, all sumchecks will restream the witness logarithmically or log log many times (depending on whether windowed sumcheck applies). I have ideas in subsequent rounds for cutting down the number of witness regenerations and using windowed sumcheck in better ways.

markosg04 added 30 commits May 16, 2026 23:48

refactor: infra for splitting of jolt verifier out of core

26dca0a

refactor(blindfold): generic crate and associated infra

b675660

feat: JoltProof in jolt-verifier

0e40cde

feat: typed verify api

3556929

tests: jolt-verifier soundness harness

c37a45c

feat: stage 1

28f5b90

feat: stage 2 and hardening

2c73d0f

stage 3

0f6c53e

feat: stage 4

12514e4

feat: stage 6

78d105e

chore: stack infra

cef6b80

fix: harden refactor audit stack updater

bb8bed1

feat: stage 6

e3cae0c

feat: stage 7

eac54f9

feat: stage 8

6b79dfa

fix: easier committed sumcheck path for jolt-verifier

81ced28

feat: stage 1 zk

6a506d5

feat: stage 3 zk

616b5de

feat: stage 4

5e0e656

feat: stage 4 zk

9a96d3f

feat: stage 5 zk

883f3f5

feat: stage 6 zk

e3ac4c6

feat: stage 7 zk

01c1ccb

feat: blindfold integration

96e4a68

fix: patch core blindfold

9149777

chore: address refactor stack review comments

5a31328

chore: delete comments spec

eb50516

test(jolt-veriifer): statistical tests for zk

7941736

fix: use same transcript for blindfold

09ed244

fix: mzhu's comments + field inline formulas + wrapper r1cs infra

eac9134

markosg04 and others added 16 commits May 22, 2026 18:27

feat: stage 5 field inline wiring

16d5825

feat: stage 6 field inline wiring

99d456c

feat: initial wrapper crate

e131b92

test: hyrax soundness testing

80a708b

feat: grumpkin ec support for dory assist

e558da2

docs: update specs for the prover side

066cb31

feat: define field-inline bytecode anchoring

af58fe5

feat: verify field-inline bytecode read-raf anchoring

3d4154d

feat: bind field-inline bytecode anchoring in BlindFold

9b255b7

feat: validate field-inline bytecode metadata

02cbedf

feat: expose field-inline product lanes

a3d5436

feat: bind field-inline bytecode metadata in preamble

1974556

refactor: avoid field-inline cfg mutation in verifier

7d041cd

feat: jolt prover harness

793ab29

field inline, prover perf

da5a5ce

spec: add streaming prover spec

1695680

github-actions Bot added spec Tracking issue for a feature spec implementation PR contains implementation of a spec labels Jun 18, 2026

specs: fill in PR number for streaming prover spec

65ff5a8

0xAndoroid added the claude-spec-review-request Triggers Claude spec analysis label Jun 18, 2026

moodlezoup removed the claude-spec-review-request Triggers Claude spec analysis label Jun 18, 2026 — with Claude

update streaming prover spec after claude reviews

28a0950

markosg04 reviewed Jun 19, 2026

View reviewed changes

Update spec in response to feedback

5154232

		within-stage peak RAM strictly decreases; confirm per-stage time regression
		stays under 2× (an

Uh oh!

Conversation

sashafrolov commented Jun 18, 2026

Uh oh!

sashafrolov commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

moodlezoup commented Jun 18, 2026

Uh oh!

sashafrolov commented Jun 18, 2026

Uh oh!

markosg04 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quangvdao commented Jun 22, 2026

Uh oh!

sashafrolov commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants