Formalize Pedersen hash + commitment using BLAKE3 generators#306
Merged
Conversation
kustosz
approved these changes
Jun 5, 2026
kustosz
left a comment
Contributor
There was a problem hiding this comment.
Approving, but please fix the wrong doc
b80efe6 to
151920c
Compare
900622f to
db23cc2
Compare
84cbe3b to
c3b5485
Compare
fa79a81 to
5add67c
Compare
e5cc3ba to
7535129
Compare
5add67c to
9b32bca
Compare
Rename the scalar projections into the Scalar namespace anchored at `def Scalar (p : Prime)`: scalarLo -> Scalar.lo scalarHi -> Scalar.hi scalarValueNat -> Scalar.valueNat so binders typed `s : Scalar p` read with dot-notation (`s.lo.val`, `s.hi.val`) and call sites read `Scalar.valueNat s` instead of a prefixed top-level name. The multiScalarMul builtin and the existing stdlib EmbeddedCurveOps call sites/docstrings follow; the stdlib bridge lemma is renamed to `valueNat_eq_crypto_valueNat` accordingly.
Deterministic square root over the BN254 scalar field (the Grumpkin
base field) at Lampe.Crypto.Bn254.Sqrt: `sqrt : Fp p -> Option (Fp p)`
with an Euler-criterion gate that short-circuits
non-quadratic-residues in milliseconds, so the algorithm terminates in
reasonable time for both QR and non-QR inputs. Needed by the concrete
Pedersen hash-to-curve model.
Correctness is a real proof of the Tonelli-Shanks loop - no
brute-force enumeration fallback - via the invariant
TSInv (a M c t R) : Prop :=
R * R = a * t -- running candidate root
t ^ (2 ^ (M-1)) = 1 -- t has order dividing 2^(M-1)
c ^ (2 ^ (M-1)) = -1 -- c has order exactly 2^M
1 <= M
established at the initial state (using euler_one_of_exists_sqrt and
the non-residue fact), preserved by each tsStep, and giving R * R = a
at termination M = 0 -> t = 1. sqrt_correct/sqrt_complete are stated
against some/isSome; they are model-fidelity assurances with no
downstream consumers by design.
The file has zero native_decide calls: the BN254-specific arithmetic
facts (nonResidue is a non-residue; the oddPart bound) use kernel
decide via fpowR square-and-multiply bridges. Constants carry
descriptive names: twoAdicity, nonResidue, oddPart.
…iltin Noir's `#[builtin(assert_constant)]` is a compiler hint that's semantically a no-op: takes any value of any type and returns unit. Wrapper proofs that step through extracted Noir bodies need to traverse `(#_assertConstant ...)` calls - notably in `std::hash::derive_generators` and `std::verify_proof_with_type` - so we expose it as a real total pure builtin (rather than a `stub`). The Lampe model accepts all inputs unconditionally. Noir's runtime panics if the value isn't a compile-time constant; in the ZK semantic this is fine because we reason about proven circuits, not the constraint generator. Includes the `Steps.lean` recognizer so `steps` traverses `assertConstant` calls automatically.
Add the pure canonical-scalar layer to Lampe.Crypto.EmbeddedCurve and
ascribe the matching range constraint to the multiScalarMul builtin:
- Scalar.Canonical: s.lo.val < 2^128 and s.hi.val < 2^126, matching
the asymmetric LO_BITS = 128 / HI_BITS = 126 split that
Barretenberg's cycle_group::batch_mul gadget actually
range-constrains (cycle_scalar.hpp; see
docs/lampe-builtin-constraint-semantics.md for the source pin). The
docstring pins the witness-limb range-constraint enforcement paths
and the constant-limb caveat to a concrete aztec-packages commit:
the bounds guarantee a unique limb decomposition of a value below
2^254 (limb-uniqueness, not modular canonicity).
- Scalar.canonicalDecomp (with a mkScalar constructor mirroring
mkPoint) and its lemmas: canonicalDecomp_Canonical,
canonicalDecomp_decomposes, canonicalDecomp_unique,
valueNat_inj_canonical, plus the Decidable instance for
Scalar.Canonical. The uniqueness lemma needs the from_field_unsafe
canonical-range disjunction: with pow128^2 about 4*p on BN254,
canonical limbs alone admit multiple decompositions of the same
field element (e.g. (0, 0) and (plo, phi) both decompose 0); the
disjunction forces the Nat sum below p.
- The multiScalarMul builtin in Lampe/Builtin/Crypto/EmbeddedCurve.lean
enforces both gadget constraints in its precondition:
(forall i, (curvePoint? (points.get i)).isSome) /\
(forall i, Scalar.Canonical (scalars.get i))
This is the source-pinned semantic ascription of the gadget's
behaviour: the on-curve constraint is the curve-relation check
inside cycle_group::batch_mul, and canonicality is the
create_limbed_range_constraint emitted on every input scalar via
straus_scalar_slice.cpp.
pow128 becomes a single shared definition in Lampe/Data/Field.lean
(the low-limb base for both the BN254 limb decomposition and the MSM
gadget's scalar split), replacing the duplicate def-equal constants in
Crypto/Bn254 and Crypto/EmbeddedCurve.
…specs - scalar_from_field_spec surfaces Scalar.Canonical in its postcondition: scalar.val < r_scalar < 2^254 already forces hi.val < 2^126 in the limb equation, so callers no longer re-derive the MSM gadget's canonicality precondition. - The MSM call is exposed at both the builtin and wrapper levels in two spec forms: *_spec (result equation only, for rw-style downstream proofs) and *_combined_spec (canonicality /\ result equation, what Pedersen's deterministic corollaries consume). - The stdlib Scalar namespace keeps the load-bearing extraction-side definitions (type, denote, mk, eq, fromBytes?, ...); the thin shims over the crypto-side Scalar.lo/Scalar.hi/Scalar.valueNat are not duplicated. The file opens Lampe.Crypto.EmbeddedCurve: the two Scalar namespaces share no member names, so the short Scalar.* spellings stay unambiguous. - EmbeddedCurveOps/Bn254 and Field/Bn254 follow the core constant unification (shared Lampe.pow128).
Concrete reference semantics for Noir's `derive_pedersen_generators` foreign builtin, matching the algorithm in barretenberg/cpp/src/barretenberg/ecc/groups/group.hpp: - Lampe/Crypto/Pedersen.lean: concrete pedersenGeneratorMathlib (a def, not opaque) implementing the two-stage protocol - build a 64-byte preimage BLAKE3(domain) || BE(index) || 0x28, then a try-and-increment loop bounded by 256 attempts; each attempt parses two BLAKE3 digests as a 512-bit big-endian integer, reduces mod the Grumpkin base field, and recovers y via Bn254.Sqrt from the parity bit (MSB of hash_hi[0]). The wrapper pedersenGeneratorPoint threads the Bn254Scalar typeclass through a dependent-if on p.natVal = r_scalar, and @[simp] pedersenGeneratorPoint_eq exposes the reduction to encodeCurvePoint of pedersenGeneratorMathlib. - Pure pedersenCommitment / pedersenHash: the canonical-decomposition MSM closed forms (domain bytes + input vector + separator offset), with rfl unfolding lemmas, plus defaultDomainBytes, pedersenHashLengthBytes and the generic bridging lemma derivePedersenGenerators_h_enc. Domain separators are typed as List (BitVec 8) end to end, so the byte width is explicit and the literals cannot overflow. - Lampe/Builtin/Crypto/Pedersen.lean: the derivePedersenGenerators foreign-builtin descriptor (total pure, generic in the array sizes (N, M)), wired to the concrete semantic model, with bytesToList (the only Tp-dependent definition) living next to it.
STHoare specs for the 5 Noir stdlib functions on top of derive_pedersen_generators: derive_generators_spec, from_field_unsafe_spec (a substantive exists-limbs decomposition `scalar = xlo + 2^128 * xhi` with the canonical-range disjunction; the limb relation is field-level, since Noir doesn't assert_max_bit_size<128> here - that's what "unsafe" means), and the commitment/hash pairs. The public interface is the four deterministic _spec_canonical theorems, with no caller-supplied canonicality witness: r = encodeCurvePoint (pedersenCommitment ...) (resp. pedersenHash) Canonicality of the existentially-witnessed scalars is surfaced from the MSM combined specs (for hash, directly from the builtin combined spec, since the hash body invokes the builtin without going through the multi_scalar_mul wrapper), and Scalar.canonicalDecomp_unique collapses the existential to Scalar.canonicalDecomp, so the sum reduces to the pure closed form. The with-separator existential _spec variants are proof intermediates only and stay private. The per-slot triple (limb relation / canonical-range disjunction / limb-range canonicality) is factored into a single FromFieldUnsafeWitness predicate, and fromFieldUnsafeRel is element-level so the hash spec's loop invariant reuses it instead of restating it inline.
…ests lib
- Lampe/Tests/Pedersen.lean validates the pure pedersenCommitment /
pedersenHash functions against Aztec's published test vectors from
assert_pedersen in the Noir stdlib: 20 example blocks covering all
N in {1..10} for both pedersen_hash_with_separator and
pedersen_commitment_with_separator with inputs [1, ..., N] and
separator N, each closed by native_decide. This certifies
byte-for-byte agreement with Barretenberg's BLAKE3 hash-to-curve,
try-and-increment counter handling, parity convention, generator
index encoding, and MSM composition.
- Lampe/Tests/EmbeddedCurveOps.lean validates the Scalar.canonicalDecomp /
Scalar.Canonical machinery on BN254: edge cases (0, 1, -1, pow128,
pow128 - 1, pow128 * phi, a random-looking value), the limb identity
f = lo + 2^128 * hi, and the (0, 0) vs (plo, phi) uniqueness corner
case that motivates canonicalDecomp_unique's disjunction
hypothesis.
Both live in the Lampe Tests lib (mirroring Lampe/Tests/Blake3.lean)
so they are exercised by the regular build, instead of standalone
stdlib files requiring direct `lake env lean` invocation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the Lampe semantics for Noir's
pedersen_hashandpedersen_commitment(with and without separator).Honestly this was the trickiest builtin to model, still not completely sure I like how the specs turned out. I tried to ground everything on what the Barretenberg gadget's actually constrain. Without some canonicality guarantees a lot of the specs required some existential statements for the limbs decomposition around MSM. Had to change MSM specs because of this, but I think our model is more precise now, it does appear it guarantees canonical decomposition (source:
cycle_group::batch_mul→straus_scalar_slice.cpp:59,create_limbed_range_constraint).Had to implement a fast square-root algorithm for prime fields, landed on Tonelli-Shanks since it we can fix a quadratic non-residue independently of input (not the case for Cipolla's).