Formalize Pedersen hash + commitment using BLAKE3 generators by Eduardogbg · Pull Request #306 · reilabs/lampe

Eduardogbg · 2026-06-05T07:58:15Z

⚠️ Stack order: main ← edu/blake3 ← edu/bn254-refactor ← edu/pedersen.
Merge upwards in that order; this PR retargets to main once the intermediates land.

Adds the Lampe semantics for Noir's pedersen_hash and
pedersen_commitment (with and without separator).

Honestly this was the trickiest builtin to model, still not completely sure I like how the specs turned out. I tried to ground everything on what the Barretenberg gadget's actually constrain. Without some canonicality guarantees a lot of the specs required some existential statements for the limbs decomposition around MSM. Had to change MSM specs because of this, but I think our model is more precise now, it does appear it guarantees canonical decomposition (source: cycle_group::batch_mul → straus_scalar_slice.cpp:59, create_limbed_range_constraint).

Had to implement a fast square-root algorithm for prime fields, landed on Tonelli-Shanks since it we can fix a quadratic non-residue independently of input (not the case for Cipolla's).

kustosz

Approving, but please fix the wrong doc

Rename the scalar projections into the Scalar namespace anchored at `def Scalar (p : Prime)`: scalarLo -> Scalar.lo scalarHi -> Scalar.hi scalarValueNat -> Scalar.valueNat so binders typed `s : Scalar p` read with dot-notation (`s.lo.val`, `s.hi.val`) and call sites read `Scalar.valueNat s` instead of a prefixed top-level name. The multiScalarMul builtin and the existing stdlib EmbeddedCurveOps call sites/docstrings follow; the stdlib bridge lemma is renamed to `valueNat_eq_crypto_valueNat` accordingly.

Deterministic square root over the BN254 scalar field (the Grumpkin base field) at Lampe.Crypto.Bn254.Sqrt: `sqrt : Fp p -> Option (Fp p)` with an Euler-criterion gate that short-circuits non-quadratic-residues in milliseconds, so the algorithm terminates in reasonable time for both QR and non-QR inputs. Needed by the concrete Pedersen hash-to-curve model. Correctness is a real proof of the Tonelli-Shanks loop - no brute-force enumeration fallback - via the invariant TSInv (a M c t R) : Prop := R * R = a * t -- running candidate root t ^ (2 ^ (M-1)) = 1 -- t has order dividing 2^(M-1) c ^ (2 ^ (M-1)) = -1 -- c has order exactly 2^M 1 <= M established at the initial state (using euler_one_of_exists_sqrt and the non-residue fact), preserved by each tsStep, and giving R * R = a at termination M = 0 -> t = 1. sqrt_correct/sqrt_complete are stated against some/isSome; they are model-fidelity assurances with no downstream consumers by design. The file has zero native_decide calls: the BN254-specific arithmetic facts (nonResidue is a non-residue; the oddPart bound) use kernel decide via fpowR square-and-multiply bridges. Constants carry descriptive names: twoAdicity, nonResidue, oddPart.

…iltin Noir's `#[builtin(assert_constant)]` is a compiler hint that's semantically a no-op: takes any value of any type and returns unit. Wrapper proofs that step through extracted Noir bodies need to traverse `(#_assertConstant ...)` calls - notably in `std::hash::derive_generators` and `std::verify_proof_with_type` - so we expose it as a real total pure builtin (rather than a `stub`). The Lampe model accepts all inputs unconditionally. Noir's runtime panics if the value isn't a compile-time constant; in the ZK semantic this is fine because we reason about proven circuits, not the constraint generator. Includes the `Steps.lean` recognizer so `steps` traverses `assertConstant` calls automatically.

Add the pure canonical-scalar layer to Lampe.Crypto.EmbeddedCurve and ascribe the matching range constraint to the multiScalarMul builtin: - Scalar.Canonical: s.lo.val < 2^128 and s.hi.val < 2^126, matching the asymmetric LO_BITS = 128 / HI_BITS = 126 split that Barretenberg's cycle_group::batch_mul gadget actually range-constrains (cycle_scalar.hpp; see docs/lampe-builtin-constraint-semantics.md for the source pin). The docstring pins the witness-limb range-constraint enforcement paths and the constant-limb caveat to a concrete aztec-packages commit: the bounds guarantee a unique limb decomposition of a value below 2^254 (limb-uniqueness, not modular canonicity). - Scalar.canonicalDecomp (with a mkScalar constructor mirroring mkPoint) and its lemmas: canonicalDecomp_Canonical, canonicalDecomp_decomposes, canonicalDecomp_unique, valueNat_inj_canonical, plus the Decidable instance for Scalar.Canonical. The uniqueness lemma needs the from_field_unsafe canonical-range disjunction: with pow128^2 about 4*p on BN254, canonical limbs alone admit multiple decompositions of the same field element (e.g. (0, 0) and (plo, phi) both decompose 0); the disjunction forces the Nat sum below p. - The multiScalarMul builtin in Lampe/Builtin/Crypto/EmbeddedCurve.lean enforces both gadget constraints in its precondition: (forall i, (curvePoint? (points.get i)).isSome) /\ (forall i, Scalar.Canonical (scalars.get i)) This is the source-pinned semantic ascription of the gadget's behaviour: the on-curve constraint is the curve-relation check inside cycle_group::batch_mul, and canonicality is the create_limbed_range_constraint emitted on every input scalar via straus_scalar_slice.cpp. pow128 becomes a single shared definition in Lampe/Data/Field.lean (the low-limb base for both the BN254 limb decomposition and the MSM gadget's scalar split), replacing the duplicate def-equal constants in Crypto/Bn254 and Crypto/EmbeddedCurve.

…specs - scalar_from_field_spec surfaces Scalar.Canonical in its postcondition: scalar.val < r_scalar < 2^254 already forces hi.val < 2^126 in the limb equation, so callers no longer re-derive the MSM gadget's canonicality precondition. - The MSM call is exposed at both the builtin and wrapper levels in two spec forms: *_spec (result equation only, for rw-style downstream proofs) and *_combined_spec (canonicality /\ result equation, what Pedersen's deterministic corollaries consume). - The stdlib Scalar namespace keeps the load-bearing extraction-side definitions (type, denote, mk, eq, fromBytes?, ...); the thin shims over the crypto-side Scalar.lo/Scalar.hi/Scalar.valueNat are not duplicated. The file opens Lampe.Crypto.EmbeddedCurve: the two Scalar namespaces share no member names, so the short Scalar.* spellings stay unambiguous. - EmbeddedCurveOps/Bn254 and Field/Bn254 follow the core constant unification (shared Lampe.pow128).

Concrete reference semantics for Noir's `derive_pedersen_generators` foreign builtin, matching the algorithm in barretenberg/cpp/src/barretenberg/ecc/groups/group.hpp: - Lampe/Crypto/Pedersen.lean: concrete pedersenGeneratorMathlib (a def, not opaque) implementing the two-stage protocol - build a 64-byte preimage BLAKE3(domain) || BE(index) || 0x28, then a try-and-increment loop bounded by 256 attempts; each attempt parses two BLAKE3 digests as a 512-bit big-endian integer, reduces mod the Grumpkin base field, and recovers y via Bn254.Sqrt from the parity bit (MSB of hash_hi[0]). The wrapper pedersenGeneratorPoint threads the Bn254Scalar typeclass through a dependent-if on p.natVal = r_scalar, and @[simp] pedersenGeneratorPoint_eq exposes the reduction to encodeCurvePoint of pedersenGeneratorMathlib. - Pure pedersenCommitment / pedersenHash: the canonical-decomposition MSM closed forms (domain bytes + input vector + separator offset), with rfl unfolding lemmas, plus defaultDomainBytes, pedersenHashLengthBytes and the generic bridging lemma derivePedersenGenerators_h_enc. Domain separators are typed as List (BitVec 8) end to end, so the byte width is explicit and the literals cannot overflow. - Lampe/Builtin/Crypto/Pedersen.lean: the derivePedersenGenerators foreign-builtin descriptor (total pure, generic in the array sizes (N, M)), wired to the concrete semantic model, with bytesToList (the only Tp-dependent definition) living next to it.

STHoare specs for the 5 Noir stdlib functions on top of derive_pedersen_generators: derive_generators_spec, from_field_unsafe_spec (a substantive exists-limbs decomposition `scalar = xlo + 2^128 * xhi` with the canonical-range disjunction; the limb relation is field-level, since Noir doesn't assert_max_bit_size<128> here - that's what "unsafe" means), and the commitment/hash pairs. The public interface is the four deterministic _spec_canonical theorems, with no caller-supplied canonicality witness: r = encodeCurvePoint (pedersenCommitment ...) (resp. pedersenHash) Canonicality of the existentially-witnessed scalars is surfaced from the MSM combined specs (for hash, directly from the builtin combined spec, since the hash body invokes the builtin without going through the multi_scalar_mul wrapper), and Scalar.canonicalDecomp_unique collapses the existential to Scalar.canonicalDecomp, so the sum reduces to the pure closed form. The with-separator existential _spec variants are proof intermediates only and stay private. The per-slot triple (limb relation / canonical-range disjunction / limb-range canonicality) is factored into a single FromFieldUnsafeWitness predicate, and fromFieldUnsafeRel is element-level so the hash spec's loop invariant reuses it instead of restating it inline.

…ests lib - Lampe/Tests/Pedersen.lean validates the pure pedersenCommitment / pedersenHash functions against Aztec's published test vectors from assert_pedersen in the Noir stdlib: 20 example blocks covering all N in {1..10} for both pedersen_hash_with_separator and pedersen_commitment_with_separator with inputs [1, ..., N] and separator N, each closed by native_decide. This certifies byte-for-byte agreement with Barretenberg's BLAKE3 hash-to-curve, try-and-increment counter handling, parity convention, generator index encoding, and MSM composition. - Lampe/Tests/EmbeddedCurveOps.lean validates the Scalar.canonicalDecomp / Scalar.Canonical machinery on BN254: edge cases (0, 1, -1, pow128, pow128 - 1, pow128 * phi, a random-looking value), the limb identity f = lo + 2^128 * hi, and the (0, 0) vs (plo, phi) uniqueness corner case that motivates canonicalDecomp_unique's disjunction hypothesis. Both live in the Lampe Tests lib (mirroring Lampe/Tests/Blake3.lean) so they are exercised by the regular build, instead of standalone stdlib files requiring direct `lake env lean` invocation.

Eduardogbg force-pushed the edu/pedersen branch from d5d8526 to 9916496 Compare June 5, 2026 08:06

kustosz approved these changes Jun 5, 2026

View reviewed changes

Comment thread Lampe/Lampe/Builtin/Crypto/EmbeddedCurve.lean Outdated

Eduardogbg force-pushed the edu/pedersen branch 3 times, most recently from b80efe6 to 151920c Compare June 8, 2026 09:44

Eduardogbg force-pushed the edu/bn254-refactor branch from 900622f to db23cc2 Compare June 10, 2026 07:03

Eduardogbg force-pushed the edu/pedersen branch 2 times, most recently from 84cbe3b to c3b5485 Compare June 10, 2026 16:45

Eduardogbg force-pushed the edu/bn254-refactor branch from fa79a81 to 5add67c Compare June 10, 2026 23:28

Eduardogbg force-pushed the edu/pedersen branch 2 times, most recently from e5cc3ba to 7535129 Compare June 11, 2026 08:00

Eduardogbg force-pushed the edu/bn254-refactor branch from 5add67c to 9b32bca Compare June 12, 2026 03:57

Eduardogbg force-pushed the edu/pedersen branch from 7535129 to dcf4104 Compare June 12, 2026 03:57

Base automatically changed from edu/bn254-refactor to main June 12, 2026 05:29

Eduardogbg added 8 commits June 12, 2026 02:30

Eduardogbg force-pushed the edu/pedersen branch from dcf4104 to 8a013eb Compare June 12, 2026 05:30

Eduardogbg merged commit 5ac9792 into main Jun 12, 2026
13 checks passed

Eduardogbg deleted the edu/pedersen branch June 12, 2026 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize Pedersen hash + commitment using BLAKE3 generators#306

Formalize Pedersen hash + commitment using BLAKE3 generators#306
Eduardogbg merged 8 commits into
mainfrom
edu/pedersen

Eduardogbg commented Jun 5, 2026

Uh oh!

kustosz left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eduardogbg commented Jun 5, 2026

Uh oh!

kustosz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants