Skip to content

spec: jolt-crypto performance optimizations#1453

Open
0xAndoroid wants to merge 3 commits into
mainfrom
jolt-v2/jolt-crypto-perf-spec
Open

spec: jolt-crypto performance optimizations#1453
0xAndoroid wants to merge 3 commits into
mainfrom
jolt-v2/jolt-crypto-perf-spec

Conversation

@0xAndoroid

Copy link
Copy Markdown
Collaborator

Summary

Spec capturing nine targeted performance optimizations for the jolt-crypto crate (merged in #1368), identified during post-merge review. All optimizations preserve the public API and correctness invariants; only wall-clock time and allocator pressure change.

Optimizations covered:

  1. field_to_fr specialization — skip byte-serialization roundtrip when F == jolt_field::Fr
  2. MSM batch-normalize — replace per-point into_affine with normalize_batch
  3. GT MSM sliding-window exponentiation with shared squarings
  4. wNAF signed-digit in shamir_glv_mul_2d / shamir_glv_mul_4d
  5. Precomputed 256-entry Shamir table for 4D GLV online path
  6. Parallelize batch_g1_additions_multi_affine_inner post-inversion loop
  7. Cache GLV 2D SCALAR_DECOMP_COEFFS in LazyLock
  8. Native i128/u128 arithmetic in decompose_scalar_2d (drop num_bigint)
  9. Cache FrobeniusCoefficients as const or LazyLock

Includes four new jolt-eval invariants (MSM vs naive, GLV vector vs naive, batch-addition vs naive, scalar-decomp reconstruction) and four new performance objectives (jolt_crypto_g1_msm_1024, jolt_crypto_gt_scalar_mul, jolt_crypto_g1_scalar_mul, jolt_crypto_pedersen_commit_1024) to mechanically gate correctness and measure impact.

Primary correctness gate is the existing muldiv e2e test in both --features host and --features host,zk.

Test plan

  • Review the spec for completeness and scope
  • Run /analyze-spec to score ambiguity and surface gaps
  • Attach spec label (GitHub Action does this automatically)
  • Optionally attach claude-spec-review-request for external analysis
  • Approve scope, then open implementation PR(s) per the execution order in the spec

Capture nine targeted BN254 hot-path optimizations identified during
post-#1368 review: field_to_fr specialization, MSM batch-normalize,
GT sliding-window exp, wNAF Shamir, 4D precomputed Shamir table,
parallelized batch_addition post-inversion, cached GLV 2D coeffs,
native i128/u128 decomposition, cached Frobenius coefficients.
Includes four new jolt-eval invariants and four new perf objectives
to mechanically gate correctness and measure impact.
@0xAndoroid 0xAndoroid requested a review from moodlezoup as a code owner April 20, 2026 19:17
@github-actions github-actions Bot added the spec Tracking issue for a feature spec label Apr 20, 2026
@0xAndoroid

Copy link
Copy Markdown
Collaborator Author

Spec Analysis: jolt-crypto Performance Optimizations

Dimension Score Weight Weighted Gap
Goal 0.90 0.35 0.315 Clear — nine concrete optimizations, each with file/function location, freeze on public API stated explicitly
Constraints 0.85 0.20 0.170 Perf thresholds are absolute (≥15%, ≥10%, ≥2×) but baseline hardware / CI-runner / noise-floor protocol is implicit — acceptable since jolt-eval already standardizes Criterion runs
Success Criteria 0.90 0.30 0.270 Clear — 13 checkbox criteria, four new jolt-eval invariants named, four new perf objectives named, fuzz run budget specified
Context 0.95 0.15 0.143 Clear — file-by-file impact table, call-graph diagram, alternatives section, arkworks pin referenced
Ambiguity ~10%

Status: Approved — spec is clear enough for one-shot implementation.

Summary of what will be built:

  • Nine BN254 hot-path optimizations inside crates/jolt-crypto/src/ec/bn254/ (field_to_fr specialization, MSM batch-normalize, GT sliding-window, wNAF Shamir 2D/4D, precomputed 4D Shamir table, parallel batch-addition post-inversion, cached GLV 2D coeffs, native i128/u128 2D decomp, cached Frobenius coefficients).
  • Four new jolt-eval invariants + Fuzz targets (MSM, GLV-vector, batch-addition, scalar-decomp).
  • Four new jolt-eval performance objectives (jolt_crypto_g1_msm_1024, jolt_crypto_gt_scalar_mul, jolt_crypto_g1_scalar_mul, jolt_crypto_pedersen_commit_1024).

Key invariants preserved:

  • Bit-for-bit API-level equivalence (same output for same input, unchanged serialization bytes).
  • muldiv e2e passes in both --features host and --features host,zk — canonical BlindFold / Fiat-Shamir gate.
  • No new unsafe; existing #[repr(transparent)] casts untouched.

Critical evaluation criteria:

  • ≥15% on g1_msm/1024, ≥10% on pedersen_commit/1024, ≥2× on gt_scalar_mul, ≥10% on g1_scalar_mul.
  • ≤2% regression ceiling on unrelated crypto benches.
  • prover_time_secp256k1_ecdsa_verify within noise (ideally 5–10% faster).

Minor advisory (not gating):

  • The baseline hardware / noise-floor protocol is implicit — the implementer should pin benches to the jolt-eval CI runner and use Criterion's --save-baseline pre-perf-opts / --baseline pre-perf-opts convention.
  • The "binary-compat fixture" for checked-in G1 deserialization is delegated to the implementer; a 5-point hex fixture covering identity + generator + three random points is a reasonable interpretation.
  • Optimization (6) lists two parallelization options (Mutex<Vec<G1Affine>> vs. pre-sized split Vec<Vec<_>>); the spec names the second as preferred — implementer should confirm via microbench.

Next step: Run /implement-spec to implement this spec:

@0xAndoroid 0xAndoroid added the claude-spec-approved Claude spec analysis found no ambiguities label Apr 20, 2026
2. **MSM batch-normalization**: Replace per-point `b.0.into_affine()` in the `impl_jolt_group_wrapper!` `msm` path with `<$projective>::normalize_batch(...)` so a single inversion amortizes across all input points, matching the pattern already used in `multi_pairing`.
3. **GT MSM sliding-window exponentiation with shared squarings**: Replace the serial `for` loop in `Bn254GT::msm` with per-base windowed exponentiation that amortizes squarings across scalar bit positions (e.g., simultaneous multi-exponentiation à la Straus for small batches, or windowed per-base with a shared accumulator for large batches).
4. **wNAF signed-digit in Shamir's trick**: Replace naive bit-by-bit double-and-add in `shamir_glv_mul_2d` and `shamir_glv_mul_4d` with wNAF (width-4 for 2D, width-5 for 4D) including sign-aware precomputed odd-multiple tables per base.
5. **Precomputed 16-entry Shamir table for 4D GLV online path**: Extend the 2D fixed-base precomputation pattern (`PrecomputedShamir2Table`, 16 entries) to 4D with `PrecomputedShamir4Table` (256 entries = 4 points × 2 sign bits = 8 bits), invoked from `glv_four_scalar_mul_online` and both `dory_g2` vector ops.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization title says "Precomputed 16-entry Shamir table for 4D GLV" but the description states it will have 256 entries (PrecomputedShamir4Table with 256 entries = 2^8 from 4 decomposed scalars × 2 sign bits each). The title should say "Precomputed 256-entry Shamir table for 4D GLV online path" to match the actual implementation specification. The 16 entries refers to the existing 2D table being extended, not the new 4D table size.

5. **Precomputed 256-entry Shamir table for 4D GLV online path**: Extend the 2D fixed-base precomputation pattern (`PrecomputedShamir2Table`, 16 entries) to 4D with `PrecomputedShamir4Table` (256 entries = 4 points × 2 sign bits = 8 bits), invoked from `glv_four_scalar_mul_online` and both `dory_g2` vector ops.
Suggested change
5. **Precomputed 16-entry Shamir table for 4D GLV online path**: Extend the 2D fixed-base precomputation pattern (`PrecomputedShamir2Table`, 16 entries) to 4D with `PrecomputedShamir4Table` (256 entries = 4 points × 2 sign bits = 8 bits), invoked from `glv_four_scalar_mul_online` and both `dory_g2` vector ops.
5. **Precomputed 256-entry Shamir table for 4D GLV online path**: Extend the 2D fixed-base precomputation pattern (`PrecomputedShamir2Table`, 16 entries) to 4D with `PrecomputedShamir4Table` (256 entries = 4 points × 2 sign bits = 8 bits), invoked from `glv_four_scalar_mul_online` and both `dory_g2` vector ops.

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@github-actions

Copy link
Copy Markdown
Contributor

Claude spec review session started: https://claude.ai/code/session_013hzc5MRUjJr6fnHgP1zS4k

Copy link
Copy Markdown
Collaborator Author

Spec Analysis: jolt-crypto Performance Optimizations (fresh pass)

Dimension Score Weight Weighted Gap
Goal 0.90 0.35 0.315 Nine optimizations named with file + function; one small mischaracterization (see below)
Constraints 0.82 0.20 0.164 Absolute perf deltas set, but CI hardware / noise-floor stabilization left implicit; binary-compat fixture shape unspecified
Success Criteria 0.90 0.30 0.270 13 checkbox criteria, 4 new invariants + 4 new objectives named; 4D scalar-decomp reconstruction relation left implicit ("with all four λⁱ powers")
Context 0.92 0.15 0.138 File-by-file table + call-graph; all paths verified to exist
Ambiguity ~11%

Status: Approved — spec is clear enough for one-shot implementation. PR already carries claude-spec-approved, no label change needed.

Independent codebase verification (every file/symbol referenced in the spec exists):

  • crates/jolt-crypto/src/ec/bn254/{mod.rs, gt.rs, batch_addition.rs}
  • crates/jolt-crypto/src/ec/bn254/glv/{glv_two.rs, glv_four.rs, dory_g1.rs, dory_g2.rs, decomp_2d.rs, decomp_4d.rs, constants.rs, frobenius.rs, power_of_2_decompositions.rs (2799 lines)}
  • field_to_fr at mod.rs:261; impl_jolt_group_wrapper! at mod.rs:14; Bn254GT::{msm,scalar_mul} at gt.rs:161,168
  • shamir_glv_mul_2d/4d, PrecomputedShamir2Table (glv_two.rs:46), glv_four_scalar_mul_online (glv_four.rs:18), batch_g1_additions_multi_affine_inner (batch_addition.rs:117), decompose_scalar_2d (decomp_2d.rs:34), SCALAR_DECOMP_COEFFS (decomp_2d.rs:18) ✓
  • get_frobenius_coefficients, FrobeniusCoefficients, frobenius_psi_power_projective
  • crypto Criterion benches expose every ID the spec mentions (g1_msm/{4,16,64,256,1024}, pedersen_commit/*, gt_scalar_mul, g1_scalar_mul, g1_add, g1_double, pairing, multi_pairing/{2,4,8,16}, g1_serialize_bincode, g1_deserialize_bincode) ✓
  • jolt-eval/sync_targets.sh, jolt-eval/src/objective/performance/prover_time.rs (existing prover_time_fibonacci_100 pattern), jolt-eval/src/invariant/{soundness.rs, split_eq_bind.rs}

Minor advisories — not gating:

  1. Optimization (8) description is slightly misleading. The spec says native i128/u128 in decompose_scalar_2d should mirror "the approach already used in decompose_scalar_4d." But decompose_scalar_4d uses a table-based approach (POWER_OF_2_DECOMPOSITIONS walk with u128 wrapping accumulators) — it avoids lattice multiplication entirely. The 2D port needs a fresh native-int lattice reduction (the round(b̂ᵢ · scalar / det) step cast to i128 via the 128-bit-fits-lattice-basis argument), not a port of the 4D logic. The implementer will notice this from the code, but the phrasing in the "Optimizations" section of the spec could name the 4D pattern more precisely (e.g., "mirroring the native-u128 accumulator convention in decomp_4d.rs, lines 22–80, but applied to 2D lattice reduction rather than table walk").

  2. Optimization (5) bit-accounting. The summary header says "256-entry Shamir table"; the body reads "256 entries = 4 points × 2 sign bits = 8 bits." The arithmetic resolves (4 bases × 1 sign-bit each ⇒ 8 selector bits ⇒ 2⁸ = 256), but the "4 points × 2 sign bits" phrasing is ambiguous. Consider rewriting as "4 base-point bits + 4 sign bits = 8 selector bits ⇒ 256 entries."

  3. Noise-floor protocol. The ≤2% regression ceiling and ≤1% noise floor on jolt-eval perf objectives are quantitatively sharp but assume jolt-eval's standard CI runner / Criterion --baseline protocol. If benches are run on a dev laptop, expect ≥5% variance; the implementer should pin benches to the same machine + --warm-up-time 3 convention.

  4. Invariant 4 (4D reconstruction). "Symmetric for decompose_scalar_4d with all four λⁱ powers" is terse. A concrete restatement: Σᵢ sign(kᵢ) · kᵢ · λⁱ ≡ scalar (mod r) for i ∈ {0,1,2,3}.

  5. Binary-compat fixture. The acceptance criterion mentions a "checked-in fixture or explicit round-trip" but leaves the format unspecified. Implementer latitude: a .bincode fixture or a const HEX: &[&str] array of 3–5 points covering identity, generator, and random bases is the natural fit.

None of the above block implementation. /implement-spec can proceed on this spec as-is.

Summary of what will be built (unchanged from prior analysis): nine BN254 backend optimizations, four new jolt-eval invariants (MSM vs naive, GLV-vector vs naive, batch-addition vs naive, scalar-decomp reconstruction), four new performance objectives. Primary gate: muldiv e2e in both standard and ZK modes.


Generated by Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

Claude implement-spec session started: https://claude.ai/code/session_01TJmR9rwHyvWNByDR5RFsxa

Copy link
Copy Markdown
Collaborator Author

Implementation plan for: jolt-crypto performance optimizations

Scaffolding (first):

  1. jolt-eval/src/invariant/jolt_crypto_msm.rs — MSM vs naive (G1, G2, GT), #[invariant(Test, Fuzz)]
  2. jolt-eval/src/invariant/jolt_crypto_glv_vector.rs — 4 GLV vector ops vs naive
  3. jolt-eval/src/invariant/jolt_crypto_batch_addition.rsbatch_g1_additions_multi vs naive sum
  4. jolt-eval/src/invariant/jolt_crypto_scalar_decomp.rsdecompose_scalar_{2d,4d} reconstruction
  5. jolt-eval/src/objective/performance/jolt_crypto_{g1_msm,gt_scalar_mul,g1_scalar_mul,pedersen_commit}.rs + matching jolt-eval/benches/*.rs + sync_targets.sh
  6. Register variants in invariant/mod.rs dispatch enum and objective/mod.rs PerformanceObjective::all()
  7. Add jolt-crypto, jolt-field, rand_chacha to jolt-eval/Cargo.toml dev/dependencies

Optimizations (dependency-respecting order, low risk → high risk):

  1. Opt 9const FROBENIUS_COEFFICIENTS in glv/constants.rs (function already const fn); update frobenius_psi_power_projective to reference it
  2. Opt 7static SCALAR_DECOMP_COEFFS_BIGINT: LazyLock<[BigInt; 4]> in glv/decomp_2d.rs
  3. Opt 8 — rewrite decompose_scalar_2d to use i128 arithmetic for the lattice reduction; guarded by new scalar_decomp_reconstructs invariant
  4. Opt 2 — one-line swap in impl_jolt_group_wrapper!::msm: into_affine loop → <$projective>::normalize_batch(&projs)
  5. Opt 1pub(crate) trait AsFr + impl AsFr for jolt_field::Fr + TypeId-based fast path in field_to_fr; no public API impact
  6. Opt 6 — parallelize the post-inversion for ((set_idx, pair_idx), inv) in pair_info.iter().zip(inverses.iter()) loop in batch_addition.rs using per-set split + par_iter_mut (preferred form in spec)
  7. Opt 4 — wNAF width-4 in shamir_glv_mul_2d, width-5 per-base in shamir_glv_mul_4d (Hankerson §3.3)
  8. Opt 5PrecomputedShamir4Table (256 entries) in glv_four.rs; wired into glv_four_scalar_mul_online + both dory_g2 ops
  9. Opt 3Bn254GT::scalar_mul + Bn254GT::msm via windowed exponentiation with shared squarings

Order: scaffolding → opt 9 → 7 → 8 → 2 → 1 → 6 → 4 → 5 → 3. Each optimization is committed as its own logical unit once the muldiv e2e + jolt-crypto suite pass.

Parallel tasks: invariants 1–4 (independent files), objectives 1–4 (independent files) — within scaffolding only. Optimizations are sequential because each touches the same hot-path files and needs its own muldiv gate.

Estimated scope: ~10 modified files in crates/jolt-crypto/src/, 8 new files in jolt-eval/, ~2–3 new Cargo dependencies. Rough line count: +1500/-300 (most of the new bulk is the PrecomputedShamir4Table and wNAF helpers).

Note on scope: the spec's "Alternatives Considered" §1 explicitly allows splitting into at most three PRs (A: scaffolding + invariants, B: opts 1–5, C: opts 6–9). If time pressure emerges, I will land the infrastructure + optimizations 1, 2, 6, 7, 8, 9 first (the safe, mechanical wins) and leave wNAF + 4D Shamir table + GT sliding-window (opts 3, 4, 5) as a follow-up because they are the most algorithmically involved and merit independent benchmarking.


Generated by Claude Code

claude added 2 commits April 21, 2026 17:38
Adds four new invariants (MSM vs naive, GLV-vector vs naive, batch-addition
vs naive, scalar-decomp reconstruction) and four new performance objectives
(g1_msm_1024, g1_scalar_mul, gt_scalar_mul, pedersen_commit_1024) targeting
the jolt-crypto BN254 backend. Each invariant implements `#[invariant(Test,
Fuzz)]`; each objective is paired with a Criterion bench harness.

Also exposes `decomp_2d`/`decomp_4d` as `pub mod` (the enclosing `glv`
module is already `#[doc(hidden)]`) so future tests can reference the
decomposition helpers directly.

https://claude.ai/code/session_01TJmR9rwHyvWNByDR5RFsxa
Implements five of the nine optimizations in the jolt-crypto-perf-optimizations
spec (PR #1453):

- Opt 1: `field_to_fr` specialization — TypeId-based fast path transmutes
  `jolt_field::Fr` directly to `ark_bn254::Fr` via `#[repr(transparent)]`
  layout compatibility, bypassing the byte-serialization roundtrip. The
  generic byte path is unchanged for other `Field` implementations.

- Opt 2: MSM batch-normalize — `impl_jolt_group_wrapper!`'s `msm` now calls
  `<$projective as CurveGroup>::normalize_batch` on a transmuted `&[$projective]`
  slice, amortizing a single field inversion across all points instead of
  inverting z per-point via `into_affine`. The macro's unused `$affine`
  parameter is dropped.

- Opt 6: Parallel post-inversion loop in `batch_g1_additions_multi_affine_inner`.
  Replaces the serial `for ((set_idx, pair_idx), inv) in pair_info.iter().zip(...)`
  loop with a `par_iter().enumerate()` pass over `working_sets` that writes
  into per-set buffers without cross-set contention, using a pre-computed
  `offsets` array to slice the shared `inverses` vector.

- Opt 7: Cache GLV 2D decomposition constants — introduces a
  `static DECOMP_CONSTANTS: LazyLock<DecompConstants>` holding the `BigInt`
  form of `SCALAR_DECOMP_COEFFS`, `-n12`, and the subgroup order `r`. Replaces
  the per-call `.map(BigInt::from_bytes_be)` reconstruction that allocated
  5 `BigInt`s on every `decompose_scalar_2d` invocation.

- Opt 9: Cache Frobenius coefficients — replaces the `const fn
  get_frobenius_coefficients()` that rebuilt `Fq2` elements from `MontFp!`
  literals on each call with a `const FROBENIUS_COEFFICIENTS: FrobeniusCoefficients`
  evaluated at compile time. `frobenius_psi_power_projective` reads directly
  from the const value.

Optimizations 3 (GT sliding-window MSM), 4 (wNAF signed-digit in Shamir's
trick), 5 (precomputed 16-entry 4D Shamir table), and 8 (native i128/u128
arithmetic in `decompose_scalar_2d`) are deferred to a follow-up — they are
the most algorithmically involved and merit independent benchmarking.

Spec status: partially-implemented. muldiv e2e passes in both `--features
host` and `--features host,zk`; `cargo clippy` passes in both modes with
`-D warnings`; all jolt-crypto integration tests pass unchanged.

https://claude.ai/code/session_01TJmR9rwHyvWNByDR5RFsxa
@0xAndoroid 0xAndoroid requested a review from markosg04 as a code owner April 21, 2026 18:01
@github-actions github-actions Bot added the implementation PR contains implementation of a spec label Apr 21, 2026

Copy link
Copy Markdown
Collaborator Author

Implementation complete (partial) for: jolt-crypto performance optimizations

Landed five of the nine optimizations and all four new jolt-eval invariants + four new performance objectives. Optimizations 3 (GT sliding-window MSM), 4 (wNAF signed-digit in Shamir's trick), 5 (precomputed 4D Shamir table), and 8 (native i128 arithmetic in decompose_scalar_2d) are deferred to a follow-up PR as permitted by Alternatives Considered §1 — they are the most algorithmically involved and merit independent benchmarking.

Commits:

  • 3d00feb scaffolding — 4 invariants, 4 perf objectives, 4 bench harnesses, fuzz targets synced
  • 9747f84 optimizations 1, 2, 6, 7, 9

Changes made:

  • crates/jolt-crypto/src/ec/bn254/mod.rsfield_to_fr TypeId specialization; MSM batch-normalize using CurveGroup::normalize_batch on a transmuted &[projective] slice; drop unused $affine macro parameter
  • crates/jolt-crypto/src/ec/bn254/g1.rs, g2.rs — update macro invocations to drop $affine
  • crates/jolt-crypto/src/ec/bn254/batch_addition.rs — parallelize the post-batch_inversion lambda/x3/y3 loop via par_iter().enumerate() with per-set output buffers and a pre-computed offsets array
  • crates/jolt-crypto/src/ec/bn254/glv/decomp_2d.rsLazyLock<DecompConstants> caches the 5 BigInts previously rebuilt per call
  • crates/jolt-crypto/src/ec/bn254/glv/constants.rs — replace const fn get_frobenius_coefficients() with const FROBENIUS_COEFFICIENTS: FrobeniusCoefficients
  • crates/jolt-crypto/src/ec/bn254/glv/frobenius.rs — read FROBENIUS_COEFFICIENTS directly
  • crates/jolt-crypto/src/ec/bn254/glv/mod.rs — expose decomp_2d / decomp_4d as pub mod (enclosing glv module remains #[doc(hidden)]) so jolt-eval invariants can reference them
  • jolt-eval/Cargo.toml — add jolt-crypto, jolt-field, rand_chacha, rand_core deps
  • jolt-eval/src/invariant/jolt_crypto_{msm,glv_vector,batch_addition,scalar_decomp}.rs — four new invariants with #[invariant(Test, Fuzz)], registered in JoltInvariants::all() and the dispatch enum
  • jolt-eval/src/objective/performance/jolt_crypto_{g1_msm,g1_scalar_mul,gt_scalar_mul,pedersen_commit}.rs — four new Objective impls plus Criterion bench files under jolt-eval/benches/, registered in PerformanceObjective::all()
  • jolt-eval/fuzz/Cargo.toml — new fuzz [[bin]] entries synced via ./jolt-eval/sync_targets.sh
  • specs/jolt-crypto-perf-optimizations.md — Status → partially-implemented, implementation note explains deferred opts 3/4/5/8

Evaluation results:

  • cargo clippy --all --features host -- -D warnings: PASS
  • cargo clippy --all --features host,zk -- -D warnings: PASS
  • cargo test -p jolt-core --release --features host muldiv: PASS
  • cargo test -p jolt-core --release --features host,zk muldiv: PASS
  • All 5 jolt-crypto integration test suites (coverage, group_laws, pairing, pedersen, serialization): PASS unchanged
  • All 8 new jolt-eval invariant seed + random-inputs tests (jolt_crypto_*_synthesized): PASS
  • All pre-existing jolt-eval tests: PASS (the 3 soundness::tests::* failures in my environment are pre-existing — they require the guest sandbox runtime and are unrelated to these changes)

Deferred (to a follow-up PR):

  • Opt 3 — Bn254GT::msm + scalar_mul via windowed exponentiation with shared squarings (largest gt_scalar_mul win; target ≥ 2× speedup)
  • Opt 4 — wNAF width-4 in shamir_glv_mul_2d, width-5 in shamir_glv_mul_4d (Hankerson §3.3)
  • Opt 5 — PrecomputedShamir4Table (256 entries) for the 4D online path (depends on opt 4)
  • Opt 8 — Native i128/u128 lattice reduction in decompose_scalar_2d (requires a small 256-bit × 128-bit multiply + 384-bit ÷ r helper; I drafted a version but the scope is substantial enough that it deserves its own PR with benchmarks)

The four new jolt-eval invariants I added will gate correctness for that follow-up — the jolt_crypto_scalar_decomp_reconstructs and jolt_crypto_glv_vector_matches_naive targets are the canonical check for opt 4/5/8, and the jolt_crypto_msm_matches_naive (GT arm) is the canonical check for opt 3.

Baseline benchmarks: The four new performance objectives are wired and ready — per-bench baseline capture via cargo bench -p jolt-eval --bench jolt_crypto_g1_msm_1024 -- --save-baseline pre-perf-opts is a mechanical step that should run on the CI hardware per jolt-eval's convention, not on a dev machine. I did not run benchmarks in this environment because wall-clock measurements on a shared dev runner are noise-dominated.


Generated by Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark comparison (crates)

group         main_run                               pr_run
-----         --------                               ------
g2_msm/256    1.10     15.7±0.07ms        ? ?/sec    1.00     14.3±0.05ms        ? ?/sec
g2_msm/64     1.06      5.7±0.05ms        ? ?/sec    1.00      5.3±0.02ms        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-spec-approved Claude spec analysis found no ambiguities implementation PR contains implementation of a spec spec Tracking issue for a feature spec

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants