This issue covers the steps required to integrate the Poulpy FHE library (specifically the poulpy-ckks crate) as a backend in HEIR.
Poulpy uses a bivariate Torus representation (base-2^{base2k} digits) instead of the traditional RNS representation. This allows bit-level precision and capacity management, which simplifies parameter selection but introduces new optimization opportunities (like lazy normalization).
Integration Steps
Add Poulpy as a Bazel Dependency
- Add
poulpy-ckks and a concrete backend crate (e.g., poulpy-cpu-ref for testing) to MODULE.bazel using the Rust crate extension.
- Add any required transitive crate dependencies, if needed.
Define the poulpy Exit Dialect
- Create the
poulpy dialect in lib/Dialect/Poulpy.
- Types:
!poulpy.ciphertext (maps to CKKSCiphertext)
!poulpy.unnormalized_ciphertext (maps to UnnormalizedCKKSCiphertext)
!poulpy.plaintext (maps to CKKSPlaintext)
!poulpy.module (maps to Module<BE>, the evaluator context)
!poulpy.scratch (maps to ScratchArena)
!poulpy.tensor_key (maps to GLWETensorKeyPrepared)
!poulpy.automorphism_key_map (maps to GLWEAutomorphismKeyHelper for rotations)
- Operations:
- Setup:
module_create, scratch_create
- Data:
encrypt, decrypt, encode, decode
- Arithmetic (Normalized):
add, sub, mul (takes tensor_key), rotate (takes automorphism_key_map), rescale
- Arithmetic (In-place/Assign):
add_assign, sub_assign, mul_assign, rotate_assign, rescale_assign
- Arithmetic (Unnormalized):
add_unnormalized, sub_unnormalized
- Maintenance:
normalize (unnormalized -> normalized), compact_limbs
- The poulpy exit dialect should be designed to work with memrefs (not tensors as we are migrating all backends to bufferized impls)
Implement Code Generation (heir-translate)
- Implement
PoulpyEmitter in lib/Target/Poulpy.
- Translate
poulpy dialect operations to Rust code calling poulpy-ckks APIs.
- Key generation/management: Prior passes will be responsible for generating
poulpy dialect code that sets up the Module<BE> and allocates the ScratchArena before providing them to the main compiled function as function arguments.
- Error handling: Poulpy APIs return
Result. The generated Rust functions should return Result and use the ? operator for error propagation.
- Generics: Generated functions should already know the target backend, so if they cannot be generic over the backend (
BE: Backend) then they should query the IR to determine what poulpy backend is being used and use that to emit appropriate code.
Implement a configure-crypto-context-like pass
- As mentioned in "Implement Code Generation", there should be a pass that sets up any needed global objects or state required for the API. This is analogous to CryptoContext in the OpenFHE backend. This pass introduces new functions that the user will call at server startup time.
- Signature Conversion: Modify function signatures to accept
!poulpy.module, !poulpy.scratch, and required keys (tensor_key, automorphism_key_map) based on the operations used in the function.
Implement Parameter Selection Pass
Poulpy's bivariate Torus representation simplifies parameter selection because we do not need to choose specific RNS prime chains.
-
Modulus Size (CT_K): We only need to decide the total bit size of the modulus. This is computed as $Q_{\text{bits}} = (L + 1) \cdot \Delta_{\text{bits}} + B_{\text{bits}}$, where $L$ is the multiplicative depth, $\Delta_{\text{bits}}$ is the scale (log_delta), and $B_{\text{bits}}$ is the headroom (log_budget).
-
Headroom (log_budget): HEIR's RangeAnalysis must be used to determine the maximum integer value size in the circuit, ensuring it fits in the headroom to prevent torus wrap-around.
-
Limb Size (base2k): Should be chosen as the largest supported by the target hardware backend (typically 52 or 64).
-
Create a pass (e.g., GenerateParamPoulpy) to determine:
-
log_delta (scaling factor, typically constant per circuit).
-
log_budget (headroom, can we use HEIR's RangeAnalysis for this if we have bounds on the input ranges?).
-
CT_K (maximum capacity in bits): Computed as (multiplicative_depth + 1) * log_delta + headroom.
-
base2k (limb size, e.g., 52).
-
Annotate the MLIR module with these parameters.
Implement Lowering Pass SecretToPoulpy
The main difficulty here is that our LWE dialects are incompatible with poulpy's bivariate ring representation. They hard-code RNS concepts that Poulpy doesn't have. So while we could try to generalize/update the LWE and CKKS dialects, instead we should just implement a lowering from secret (arithmetic on secret packed cleartext tensors) to poulpy directly. Then after the initial e2e implementation is working, we can decide how/whether to have a full representation of the scheme at the LWE level, and what other changes to the pipeline are necessary to support that.
- Create
lib/Dialect/Secret/Conversions/SecretToPoulpy.
- Scale Management: Since the upstream pipeline should be configured to disable explicit scale management (see Section 3), this pass typically will not encounter
mgmt.rescale operations in standard flows.
Implement a poulpy-mgmt pass for ciphertext management
This should replace the existing insert-mgmt-ckks sub-pipeline when targeting the poulpy backend.
Sub-passes should include:
- limb compaction: Insert
compact_limbs after multiplications to reclaim unused limbs (since multiplication reduces effective_k).
- lazy normalization: Identify chains of additions/subtractions and convert them to
add_unnormalized/sub_unnormalized to avoid intermediate carry propagation.
- Insert
normalize operations only before operations that require normalized inputs:
- Multiplication (
mul, mul_assign)
- Rotation (
rotate, rotate_assign)
- Function return
Step 7: End-to-End Testing
- Create a new set of pipeline flags and configurations related to the poulpy backend, and assemble the right pipeline for the middle-end (secret + mgmt optimizations -> poulpy).
- Create a Bazel macro
heir_poulpy_lib (similar to heir_lattigo_lib) to automate heir-opt -> heir-translate -> rust_library compilation.
- Write Rust test harnesses that import the generated code, initialize Poulpy backends (e.g.,
poulpy-cpu-ref), encrypt inputs, run the function, and verify outputs.
Add an in-place optimization pass at the poulpy level
This should be similar to the lattigo/openfhe in place optimizations and can be considered an extra optimization (though it should be the default).
Open questions
- Do we need to represent the rescale operation in the
poulpy dialect? The poulpy-ckks API has rescale and rescale_assign operations, but when would be use them? Maybe for apples-to-apples comparisons of CKKS programs? LLM suggests we may want a ckks_rescale_into to copy ciphertexts into "structs with pre-defined smaller capacities," but this seems like it could be irrelevant to our purposes.
This issue covers the steps required to integrate the Poulpy FHE library (specifically the
poulpy-ckkscrate) as a backend in HEIR.Poulpy uses a bivariate Torus representation (base-
2^{base2k}digits) instead of the traditional RNS representation. This allows bit-level precision and capacity management, which simplifies parameter selection but introduces new optimization opportunities (like lazy normalization).Integration Steps
Add Poulpy as a Bazel Dependency
poulpy-ckksand a concrete backend crate (e.g.,poulpy-cpu-reffor testing) toMODULE.bazelusing the Rustcrateextension.Define the
poulpyExit Dialectpoulpydialect inlib/Dialect/Poulpy.!poulpy.ciphertext(maps toCKKSCiphertext)!poulpy.unnormalized_ciphertext(maps toUnnormalizedCKKSCiphertext)!poulpy.plaintext(maps toCKKSPlaintext)!poulpy.module(maps toModule<BE>, the evaluator context)!poulpy.scratch(maps toScratchArena)!poulpy.tensor_key(maps toGLWETensorKeyPrepared)!poulpy.automorphism_key_map(maps toGLWEAutomorphismKeyHelperfor rotations)module_create,scratch_createencrypt,decrypt,encode,decodeadd,sub,mul(takestensor_key),rotate(takesautomorphism_key_map),rescaleadd_assign,sub_assign,mul_assign,rotate_assign,rescale_assignadd_unnormalized,sub_unnormalizednormalize(unnormalized -> normalized),compact_limbsImplement Code Generation (
heir-translate)PoulpyEmitterinlib/Target/Poulpy.poulpydialect operations to Rust code callingpoulpy-ckksAPIs.poulpydialect code that sets up theModule<BE>and allocates theScratchArenabefore providing them to the main compiled function as function arguments.Result. The generated Rust functions should returnResultand use the?operator for error propagation.BE: Backend) then they should query the IR to determine what poulpy backend is being used and use that to emit appropriate code.Implement a configure-crypto-context-like pass
!poulpy.module,!poulpy.scratch, and required keys (tensor_key,automorphism_key_map) based on the operations used in the function.Implement Parameter Selection Pass
Poulpy's bivariate Torus representation simplifies parameter selection because we do not need to choose specific RNS prime chains.
Modulus Size ($Q_{\text{bits}} = (L + 1) \cdot \Delta_{\text{bits}} + B_{\text{bits}}$ , where $L$ is the multiplicative depth, $\Delta_{\text{bits}}$ is the scale ($B_{\text{bits}}$ is the headroom (
CT_K): We only need to decide the total bit size of the modulus. This is computed aslog_delta), andlog_budget).Headroom (
log_budget): HEIR'sRangeAnalysismust be used to determine the maximum integer value size in the circuit, ensuring it fits in the headroom to prevent torus wrap-around.Limb Size (
base2k): Should be chosen as the largest supported by the target hardware backend (typically 52 or 64).Create a pass (e.g.,
GenerateParamPoulpy) to determine:log_delta(scaling factor, typically constant per circuit).log_budget(headroom, can we use HEIR'sRangeAnalysisfor this if we have bounds on the input ranges?).CT_K(maximum capacity in bits): Computed as(multiplicative_depth + 1) * log_delta + headroom.base2k(limb size, e.g., 52).Annotate the MLIR module with these parameters.
Implement Lowering Pass
SecretToPoulpyThe main difficulty here is that our LWE dialects are incompatible with poulpy's bivariate ring representation. They hard-code RNS concepts that Poulpy doesn't have. So while we could try to generalize/update the LWE and CKKS dialects, instead we should just implement a lowering from
secret(arithmetic on secret packed cleartext tensors) topoulpydirectly. Then after the initial e2e implementation is working, we can decide how/whether to have a full representation of the scheme at the LWE level, and what other changes to the pipeline are necessary to support that.lib/Dialect/Secret/Conversions/SecretToPoulpy.mgmt.rescaleoperations in standard flows.Implement a
poulpy-mgmtpass for ciphertext managementThis should replace the existing
insert-mgmt-ckkssub-pipeline when targeting the poulpy backend.Sub-passes should include:
compact_limbsafter multiplications to reclaim unused limbs (since multiplication reduceseffective_k).add_unnormalized/sub_unnormalizedto avoid intermediate carry propagation.normalizeoperations only before operations that require normalized inputs:mul,mul_assign)rotate,rotate_assign)Step 7: End-to-End Testing
heir_poulpy_lib(similar toheir_lattigo_lib) to automateheir-opt->heir-translate->rust_librarycompilation.poulpy-cpu-ref), encrypt inputs, run the function, and verify outputs.Add an in-place optimization pass at the
poulpylevelThis should be similar to the lattigo/openfhe in place optimizations and can be considered an extra optimization (though it should be the default).
Open questions
poulpydialect? Thepoulpy-ckksAPI hasrescaleandrescale_assignoperations, but when would be use them? Maybe for apples-to-apples comparisons of CKKS programs? LLM suggests we may want ackks_rescale_intoto copy ciphertexts into "structs with pre-defined smaller capacities," but this seems like it could be irrelevant to our purposes.