Spec: Streaming Prover#1629
Conversation
|
@claude analyze |
|
Claude spec review session started: https://claude.ai/code/session_0113TJTBYW1ukcB76q2vvJbB |
|
Spec Analysis: Streaming Prover (spec @
Status: Questions remain — 1 blocking item to resolve before implementation. I verified the spec's codebase claims against the PR branch ( Questions: 1. [Blocking — Success Criteria] The
Candidate answer — please confirm one: (a) modular streaming-off is already byte-identical to Suggestions (non-blocking):
Generated by Claude Code |
|
I fixed the comments (my local claude didn't say these issues), can't modify the tags on the PR myself. |
markosg04
left a comment
There was a problem hiding this comment.
Good work! Most of my comments are just me trying to provide useful context and hopefully predict any potential footguns with your approach
| Space](https://eprint.iacr.org/2025/611): switch every access to the trace | ||
| vector to a lazily generated, streaming view so peak prover RAM scales with | ||
| `√T` instead of `T`, bounding memory to a few GB for arbitrarily long | ||
| executions and making Jolt usable on laptops and other memory-constrained | ||
| machines. The feature is gated behind a new compile-time `streaming` Cargo |
There was a problem hiding this comment.
I suppose we want the data structure for a particular sum-check / prover component to be sqrt(T) or streaming view of the data
| 3. **Run a whole-prover optimization pass after all 9 stages are converted**, | ||
| since per-stage 2× guards can compound — optimize until end-to-end prover | ||
| time is within ~2× of the non-streaming baseline. | ||
|
|
There was a problem hiding this comment.
You may wnat to consider adding additional mechanical checks (https://github.com/semgrep/semgrep) to make sure agents don't accidentally modify the verifier or other unwanted things during a slice. Codex/claude are very trained on following a loop of implement, check, find the next step and repeat. you might want to consider commit instructions too per step
| Route every prover access to the execution trace through a lazily generated, | ||
| parallel-consumable streaming view — gated behind a compile-time `streaming` | ||
| Cargo feature — so that peak prover RAM scales with `√T` rather than `T` while | ||
| the prover emits byte-identical proofs. |
There was a problem hiding this comment.
(just pointing this out again, you can probbaly in one shot update the whole spec) you may want to adjust this to be that the prover algorithm either is allocating sqrt space or a streaming view of a linear space datat structure
| - Two streaming-aware core algorithms underpin the feature: (1) a **streaming | ||
| Dory commitment** that computes the commitment's vector-matrix products over | ||
| trace chunks without materializing the polynomial. The streaming | ||
| commitment surface exists today | ||
| ([crates/jolt-dory/src/streaming.rs](crates/jolt-dory/src/streaming.rs)), but | ||
| the opening path is **not yet fully streaming** — part of it still | ||
| materializes a witness that should be consumed chunk-wise, which this spec | ||
| must eliminate; |
There was a problem hiding this comment.
Perhaps the PR with the 'modular' jolt-prover stack (which your work should build on) doens't have the dory commit and vector matrix product sqrt space optimization but jolt-core does (also see: #1632).
While I'm at it: let me clarify that we took jolt-core and 'stripped' it down to jolt-prover-legacy, so that we could finish the cutover of the modular stack and unify to the single jolt-verifier, whereas the other jolt-prover line of work is coming soon which introduces the decoupled model of the witness, prover orchestration, and backends/compute primitives.
| - A `streaming` Cargo feature on `jolt-prover` that forwards to the backend | ||
| capability crates (`jolt-backends`, `jolt-witness`, `jolt-dory`), selecting | ||
| the streaming code path at compile time via `#[cfg(feature = "streaming")]` | ||
| exactly as `zk`/`field-inline` forward today |
There was a problem hiding this comment.
We may want to consider nice generic compile time cfg for other backends too, since this streaming backend is the first 'alternative' to the canonnical cpu we are introducing (gpu backends to soon come), but this is not too important rn, just something to think about
| - **`tracer`** — owns the lazy-trace mechanism (`LazyTraceIterator`, | ||
| [tracer/src/lib.rs](tracer/src/lib.rs)). The streaming trace view is built | ||
| here; trace ownership stays in `tracer`. The first CPU run is serial, but | ||
| subsequent recompute passes run in parallel (rayon), so chunk access must be | ||
| parallelizable and chunked streaming must not serialize the prover's hot | ||
| loops. |
There was a problem hiding this comment.
so tracer should hold the actual program emulation implementation (hence the lazy iterator api) while the jolt-witness should provide some generic interface for the streaming witness, where we can implement such a generic api for tracer. Let's try and maintain this decoupled style / generic style so that it becomes simple to later swap the tracer without needing to worry about re-implementing streaming backend stuff
| and witness providers. The prover↔witness interface must offer a streaming | ||
| oracle, not a fully materialized one, and must allow parallel access | ||
| to its witness chunks. | ||
| - **`jolt-backends` (`cpu`)** — the CPU streaming compute and the bulk of the |
There was a problem hiding this comment.
- yea in general, it's ok to separate out most of the streaming primitives as a completely different cpu backend (say, a
streaming-backend) if that makes it easier to organize things.
| polynomial types. The commitment surface is in place; the remaining work is | ||
| the opening path, which still materializes a witness that should instead be | ||
| consumed chunk-wise. | ||
| - **`jolt-sumcheck`** — the shared sumcheck protocol/verifier-contract crate |
There was a problem hiding this comment.
again u may want to use semgrep or other skills / checks to make sure agents don't accidentally touch apis that are not needed to be modified in theory here such as jolt-sumcheck
| 3. **Run a whole-prover optimization pass after all 9 stages are converted**, | ||
| since per-stage 2× guards can compound — optimize until end-to-end prover | ||
| time is within ~2× of the non-streaming baseline. |
There was a problem hiding this comment.
I probably would recommend adding a bit more detailed set of instructions or some skills for specifically how to go about an 'optimization step': like exactly how it should instrument it, over what data (say, sha2-chain 2^16 - 2^20 or something). Just make sure that the 'acceptance' rules are strict and hopefully avoid some reward hacking
There was a problem hiding this comment.
And you might want to consider using some of thees skills for general development:
https://github.com/multica-ai/andrej-karpathy-skills
I've been using them lately and think they are helpful
| within-stage peak RAM strictly decreases; confirm per-stage time regression | ||
| stays under 2× (an |
There was a problem hiding this comment.
You can use tools like allocative(https://crates.io/crates/allocative) (or roll your own solution with something like size_of) to mechanically check the prover's data structures, and/or standard instrumenting tools to test the RSS. Maybe just checking actual data structures involving T would be better due to unpredictable allocator behavior, though you may need to play around with it and see.
|
This spec seems very ambitious and there are a lot of under-specified details. From my past thinking about streaming Jolt, here are the following blockers to getting performance overhead to be reasonable (I still think <2x is way too optimistic):
|
|
I updated the spec in response to all of your feedback. I have slightly downscoped this PR to just be an initial rough implementation with current algorithms, and then leaving the optimizations to a few subsequent rounds. In this implementation, all sumchecks will restream the witness logarithmically or log log many times (depending on whether windowed sumcheck applies). I have ideas in subsequent rounds for cutting down the number of witness regenerations and using windowed sumcheck in better ways. |
Draft spec for the streaming Jolt prover. See specs/jolt-streaming-prover.md.