Skip to content

Formalize Pedersen hash + commitment using BLAKE3 generators#306

Merged
Eduardogbg merged 8 commits into
mainfrom
edu/pedersen
Jun 12, 2026
Merged

Formalize Pedersen hash + commitment using BLAKE3 generators#306
Eduardogbg merged 8 commits into
mainfrom
edu/pedersen

Conversation

@Eduardogbg

Copy link
Copy Markdown
Collaborator

⚠️ Stack order: mainedu/blake3edu/bn254-refactoredu/pedersen.
Merge upwards in that order; this PR retargets to main once the intermediates land.

Adds the Lampe semantics for Noir's pedersen_hash and
pedersen_commitment (with and without separator).

Honestly this was the trickiest builtin to model, still not completely sure I like how the specs turned out. I tried to ground everything on what the Barretenberg gadget's actually constrain. Without some canonicality guarantees a lot of the specs required some existential statements for the limbs decomposition around MSM. Had to change MSM specs because of this, but I think our model is more precise now, it does appear it guarantees canonical decomposition (source: cycle_group::batch_mulstraus_scalar_slice.cpp:59, create_limbed_range_constraint).

Had to implement a fast square-root algorithm for prime fields, landed on Tonelli-Shanks since it we can fix a quadratic non-residue independently of input (not the case for Cipolla's).

@kustosz kustosz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but please fix the wrong doc

Comment thread Lampe/Lampe/Builtin/Crypto/EmbeddedCurve.lean Outdated
@Eduardogbg Eduardogbg force-pushed the edu/pedersen branch 3 times, most recently from b80efe6 to 151920c Compare June 8, 2026 09:44
@Eduardogbg Eduardogbg force-pushed the edu/bn254-refactor branch from 900622f to db23cc2 Compare June 10, 2026 07:03
@Eduardogbg Eduardogbg force-pushed the edu/pedersen branch 2 times, most recently from 84cbe3b to c3b5485 Compare June 10, 2026 16:45
@Eduardogbg Eduardogbg force-pushed the edu/bn254-refactor branch from fa79a81 to 5add67c Compare June 10, 2026 23:28
@Eduardogbg Eduardogbg force-pushed the edu/pedersen branch 2 times, most recently from e5cc3ba to 7535129 Compare June 11, 2026 08:00
@Eduardogbg Eduardogbg force-pushed the edu/bn254-refactor branch from 5add67c to 9b32bca Compare June 12, 2026 03:57
Base automatically changed from edu/bn254-refactor to main June 12, 2026 05:29
Rename the scalar projections into the Scalar namespace anchored at
`def Scalar (p : Prime)`:

  scalarLo       -> Scalar.lo
  scalarHi       -> Scalar.hi
  scalarValueNat -> Scalar.valueNat

so binders typed `s : Scalar p` read with dot-notation (`s.lo.val`,
`s.hi.val`) and call sites read `Scalar.valueNat s` instead of a
prefixed top-level name. The multiScalarMul builtin and the existing
stdlib EmbeddedCurveOps call sites/docstrings follow; the stdlib
bridge lemma is renamed to `valueNat_eq_crypto_valueNat` accordingly.
Deterministic square root over the BN254 scalar field (the Grumpkin
base field) at Lampe.Crypto.Bn254.Sqrt: `sqrt : Fp p -> Option (Fp p)`
with an Euler-criterion gate that short-circuits
non-quadratic-residues in milliseconds, so the algorithm terminates in
reasonable time for both QR and non-QR inputs. Needed by the concrete
Pedersen hash-to-curve model.

Correctness is a real proof of the Tonelli-Shanks loop - no
brute-force enumeration fallback - via the invariant

  TSInv (a M c t R) : Prop :=
    R * R = a * t           -- running candidate root
    t ^ (2 ^ (M-1)) = 1     -- t has order dividing 2^(M-1)
    c ^ (2 ^ (M-1)) = -1    -- c has order exactly 2^M
    1 <= M

established at the initial state (using euler_one_of_exists_sqrt and
the non-residue fact), preserved by each tsStep, and giving R * R = a
at termination M = 0 -> t = 1. sqrt_correct/sqrt_complete are stated
against some/isSome; they are model-fidelity assurances with no
downstream consumers by design.

The file has zero native_decide calls: the BN254-specific arithmetic
facts (nonResidue is a non-residue; the oddPart bound) use kernel
decide via fpowR square-and-multiply bridges. Constants carry
descriptive names: twoAdicity, nonResidue, oddPart.
…iltin

Noir's `#[builtin(assert_constant)]` is a compiler hint that's semantically
a no-op: takes any value of any type and returns unit. Wrapper proofs that
step through extracted Noir bodies need to traverse `(#_assertConstant ...)`
calls - notably in `std::hash::derive_generators` and
`std::verify_proof_with_type` - so we expose it as a real total pure
builtin (rather than a `stub`).

The Lampe model accepts all inputs unconditionally. Noir's runtime panics
if the value isn't a compile-time constant; in the ZK semantic this is
fine because we reason about proven circuits, not the constraint
generator.

Includes the `Steps.lean` recognizer so `steps` traverses
`assertConstant` calls automatically.
Add the pure canonical-scalar layer to Lampe.Crypto.EmbeddedCurve and
ascribe the matching range constraint to the multiScalarMul builtin:

- Scalar.Canonical: s.lo.val < 2^128 and s.hi.val < 2^126, matching
  the asymmetric LO_BITS = 128 / HI_BITS = 126 split that
  Barretenberg's cycle_group::batch_mul gadget actually
  range-constrains (cycle_scalar.hpp; see
  docs/lampe-builtin-constraint-semantics.md for the source pin). The
  docstring pins the witness-limb range-constraint enforcement paths
  and the constant-limb caveat to a concrete aztec-packages commit:
  the bounds guarantee a unique limb decomposition of a value below
  2^254 (limb-uniqueness, not modular canonicity).
- Scalar.canonicalDecomp (with a mkScalar constructor mirroring
  mkPoint) and its lemmas: canonicalDecomp_Canonical,
  canonicalDecomp_decomposes, canonicalDecomp_unique,
  valueNat_inj_canonical, plus the Decidable instance for
  Scalar.Canonical. The uniqueness lemma needs the from_field_unsafe
  canonical-range disjunction: with pow128^2 about 4*p on BN254,
  canonical limbs alone admit multiple decompositions of the same
  field element (e.g. (0, 0) and (plo, phi) both decompose 0); the
  disjunction forces the Nat sum below p.
- The multiScalarMul builtin in Lampe/Builtin/Crypto/EmbeddedCurve.lean
  enforces both gadget constraints in its precondition:

    (forall i, (curvePoint? (points.get i)).isSome) /\
    (forall i, Scalar.Canonical (scalars.get i))

  This is the source-pinned semantic ascription of the gadget's
  behaviour: the on-curve constraint is the curve-relation check
  inside cycle_group::batch_mul, and canonicality is the
  create_limbed_range_constraint emitted on every input scalar via
  straus_scalar_slice.cpp.

pow128 becomes a single shared definition in Lampe/Data/Field.lean
(the low-limb base for both the BN254 limb decomposition and the MSM
gadget's scalar split), replacing the duplicate def-equal constants in
Crypto/Bn254 and Crypto/EmbeddedCurve.
…specs

- scalar_from_field_spec surfaces Scalar.Canonical in its
  postcondition: scalar.val < r_scalar < 2^254 already forces
  hi.val < 2^126 in the limb equation, so callers no longer re-derive
  the MSM gadget's canonicality precondition.
- The MSM call is exposed at both the builtin and wrapper levels in
  two spec forms: *_spec (result equation only, for rw-style
  downstream proofs) and *_combined_spec (canonicality /\ result
  equation, what Pedersen's deterministic corollaries consume).
- The stdlib Scalar namespace keeps the load-bearing extraction-side
  definitions (type, denote, mk, eq, fromBytes?, ...); the thin shims
  over the crypto-side Scalar.lo/Scalar.hi/Scalar.valueNat are not
  duplicated. The file opens Lampe.Crypto.EmbeddedCurve: the two
  Scalar namespaces share no member names, so the short Scalar.*
  spellings stay unambiguous.
- EmbeddedCurveOps/Bn254 and Field/Bn254 follow the core constant
  unification (shared Lampe.pow128).
Concrete reference semantics for Noir's `derive_pedersen_generators`
foreign builtin, matching the algorithm in
barretenberg/cpp/src/barretenberg/ecc/groups/group.hpp:

- Lampe/Crypto/Pedersen.lean: concrete pedersenGeneratorMathlib (a
  def, not opaque) implementing the two-stage protocol - build a
  64-byte preimage BLAKE3(domain) || BE(index) || 0x28, then a
  try-and-increment loop bounded by 256 attempts; each attempt parses
  two BLAKE3 digests as a 512-bit big-endian integer, reduces mod the
  Grumpkin base field, and recovers y via Bn254.Sqrt from the parity
  bit (MSB of hash_hi[0]). The wrapper pedersenGeneratorPoint threads
  the Bn254Scalar typeclass through a dependent-if on
  p.natVal = r_scalar, and @[simp] pedersenGeneratorPoint_eq exposes
  the reduction to encodeCurvePoint of pedersenGeneratorMathlib.
- Pure pedersenCommitment / pedersenHash: the canonical-decomposition
  MSM closed forms (domain bytes + input vector + separator offset),
  with rfl unfolding lemmas, plus defaultDomainBytes,
  pedersenHashLengthBytes and the generic bridging lemma
  derivePedersenGenerators_h_enc. Domain separators are typed as
  List (BitVec 8) end to end, so the byte width is explicit and the
  literals cannot overflow.
- Lampe/Builtin/Crypto/Pedersen.lean: the derivePedersenGenerators
  foreign-builtin descriptor (total pure, generic in the array sizes
  (N, M)), wired to the concrete semantic model, with bytesToList
  (the only Tp-dependent definition) living next to it.
STHoare specs for the 5 Noir stdlib functions on top of
derive_pedersen_generators: derive_generators_spec,
from_field_unsafe_spec (a substantive exists-limbs decomposition
`scalar = xlo + 2^128 * xhi` with the canonical-range disjunction; the
limb relation is field-level, since Noir doesn't
assert_max_bit_size<128> here - that's what "unsafe" means), and the
commitment/hash pairs.

The public interface is the four deterministic _spec_canonical
theorems, with no caller-supplied canonicality witness:

  r = encodeCurvePoint (pedersenCommitment ...)    (resp. pedersenHash)

Canonicality of the existentially-witnessed scalars is surfaced from
the MSM combined specs (for hash, directly from the builtin combined
spec, since the hash body invokes the builtin without going through
the multi_scalar_mul wrapper), and Scalar.canonicalDecomp_unique
collapses the existential to Scalar.canonicalDecomp, so the sum reduces to
the pure closed form. The with-separator existential _spec variants
are proof intermediates only and stay private.

The per-slot triple (limb relation / canonical-range disjunction /
limb-range canonicality) is factored into a single
FromFieldUnsafeWitness predicate, and fromFieldUnsafeRel is
element-level so the hash spec's loop invariant reuses it instead of
restating it inline.
…ests lib

- Lampe/Tests/Pedersen.lean validates the pure pedersenCommitment /
  pedersenHash functions against Aztec's published test vectors from
  assert_pedersen in the Noir stdlib: 20 example blocks covering all
  N in {1..10} for both pedersen_hash_with_separator and
  pedersen_commitment_with_separator with inputs [1, ..., N] and
  separator N, each closed by native_decide. This certifies
  byte-for-byte agreement with Barretenberg's BLAKE3 hash-to-curve,
  try-and-increment counter handling, parity convention, generator
  index encoding, and MSM composition.
- Lampe/Tests/EmbeddedCurveOps.lean validates the Scalar.canonicalDecomp /
  Scalar.Canonical machinery on BN254: edge cases (0, 1, -1, pow128,
  pow128 - 1, pow128 * phi, a random-looking value), the limb identity
  f = lo + 2^128 * hi, and the (0, 0) vs (plo, phi) uniqueness corner
  case that motivates canonicalDecomp_unique's disjunction
  hypothesis.

Both live in the Lampe Tests lib (mirroring Lampe/Tests/Blake3.lean)
so they are exercised by the regular build, instead of standalone
stdlib files requiring direct `lake env lean` invocation.
@Eduardogbg Eduardogbg merged commit 5ac9792 into main Jun 12, 2026
13 checks passed
@Eduardogbg Eduardogbg deleted the edu/pedersen branch June 12, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants