Skip to content

wip: Rotom layout alignment, assignment, and lowering#2980

Draft
edwjchen wants to merge 62 commits into
google:mainfrom
edwjchen:rotom_layout
Draft

wip: Rotom layout alignment, assignment, and lowering#2980
edwjchen wants to merge 62 commits into
google:mainfrom
edwjchen:rotom_layout

Conversation

@edwjchen

Copy link
Copy Markdown
Collaborator

do not merge yet!

@AlexanderViand AlexanderViand marked this pull request as draft May 21, 2026 16:52
@asraa asraa self-requested a review May 21, 2026 16:52
edwjchen added 26 commits June 11, 2026 21:23
Add withRolls() and enumerateSingleRolls() to LayoutAlignment. The latter
ranges over ordered pairs of distinct, equal-extent slot-side traversal dims
and returns the materializable single-roll variants of a layout, the building
block for seeding rolled (diagonal) packings into the layout search.

Rolls only affect the slot line of the materialized layout, so a roll whose
'from' dim is ciphertext-side lowers as a no-op; restrict both roll indices to
the slot side so every variant is a genuinely distinct packing.
Two pipeline-execution tests proving rolled (diagonal) layouts compute
correctly through layout assignment, materialization, the relation-driven
lowering, and the interpreter:
- a shared rolled layout on both addf operands, and
- a rolled operand added to a row-major operand, which forces the lowering
  to remap between the rolled and unrolled packings.

Validates the premise that the existing brute-force path already handles
rolled layouts, de-risking rolled-variant injection into the search.
Add RotomTensorOpLowering::lowerMatmulByRotation, a diagonal matmul kernel
that replaces the brute-force per-scalar mask+remap lowering with ciphertext
rotations. For a single-ciphertext layout it groups every (i,j,k) contraction
term by the rotation pair that realizes it -- (srcSlot - dstSlot) mod numSlots
for each operand -- and emits one masked rotate-multiply per group, accumulated
into the destination operand. The construction is correct for any such layout
(each contribution is realized exactly once at its destination, with a
shift-collision guard and single-point/single-ciphertext gates that fall back
to the brute-force path), and collapses to few rotations exactly when the
layout is rolled/diagonal: a Halevi-Shoup matvec with a diagonal-packed matrix
uses 14 rotations vs 32 row-major.

Also add tensor_ext.rotate support to the OpenFHE interpreter so the kernel is
execution-verifiable, update the matmul lowering golden (remap -> rotate), and
add an execution test asserting the rolled packing uses fewer rotations and
still computes the correct matvec.
…NIST@32768

Implements the Halevi-Shoup ciphertext-axis diagonal matvec with a
baby-step/giant-step schedule, auto-discovered from a row-major seed, and
threads it through the whole seed -> assign -> materialize -> convert
pipeline so a full MNIST 784->512->10 MLP lowers at the reference
ciphertext size 32768 with the reference packing (16-ciphertext layer 1).

Kernels (RotomTensorOpLowering):
- lowerMatvecCtDiagonalBsgs: square/squat single-period + replicated BSGS.
- lowerMatvecDenseDiagonal: P=N/K diagonals per ciphertext (dense packing).
- Both handle non-power-of-two M and K by doing slot arithmetic over the
  padded extents Dp=nextPow2(m), Kp=nextPow2(n) while verifying only the
  real [0,m)x[0,n) domain (padding is zero); identical for power-of-two dims.
- Layout verification rebuilt the ISL relation per matrix element (O(m*n)
  ISL builds, minutes at network scale); now indexes each relation once
  (indexRangeByDomain) -> per-element lookups. Convert of a 512-hidden layer
  drops from minutes to ~16s.

Search (LayoutAssignment):
- Generates the rolled diagonal/identity candidates and a rotation-aware
  cost so the matvec kernel is discovered without a hand-written layout.
- Kernel gate checks the squat contract on padded extents.
- Fixes a back-propagation break where a bare operand candidate shadowed an
  intermediate's candidate (chained matmuls losing their kernel).

Seed (SeedLayout): replication-to-fill -- a tensor smaller than the
ciphertext (e.g. a 10x512 weight at n=32768) now gets a seed (full slot
packing + a replication dim) so the search has a candidate to roll.

Secret path (MaterializeTensorExtLayout): copy a secret.generic operand's
materialized layout onto the corresponding function-argument attribute, so
the type converter converts the secret arg (fixes a block-arg/operand type
mismatch when a secret vector materializes wider than its data).

Utils/Layout: guard isRelation{Bicyclic,RowMajor,PerRow,SquatDiagonal}
against incompatible Presburger spaces (a replicated layout has extra local
vars) instead of asserting inside IntegerRelation::isEqual.

Pipeline: register --mlir-to-rotom-ciphertext (seed -> assign -> materialize
-> convert) so high-level tensor IR in secret.generic regions lowers in one
command with no hand-written layouts.

Tests: RotomPipelineExecutionTest gains end-to-end numerical coverage for
the square/squat/dense/replicated/under-filled/padded matvec regimes, a
two-layer MLP (matvec -> square -> matvec), and an MNIST-layer-scale
512x1024 -> 16x512 MLP; plus a lit test for the registered pipeline.
A neural-net layer y = x . W^T is a vector-times-matrix matmul x(1xK) * M(KxN),
but the ciphertext-axis diagonal matvec kernel only handles matrix * column-
vector. SeedLayout now rewrites such matmuls (gate: lhs.dim0==1 && rhs.dim1>1)
into transpose(matmul(M^T, x^T)) before seeding, reusing the existing kernel;
when M = transpose(W) the pre-transpose W is used directly to avoid a redundant
transpose-of-transpose. The matrix is then W (NxK) and must be squat (N<=K),
which the real MNIST layers (512x784, 10x512) satisfy.

Both matmuls of the real MNIST model now lower to RotomMatmul with no leftover
matmul. Adds a lit test for the rewrite.

Also guards two more IntegerRelation::isEqual call sites (ConvertTensor
{Collapse,Expand}Shape) against incompatible Presburger spaces -- exposed by the
rewrite's vector transposes lowering to reshapes over replicated layouts.
Seven fixes from the branch code review:

- SeedLayout: only reuse a transpose producer's input when it is a true
  [1,0] permutation, else an identity transpose would build an invalid
  matmul(K x N, K x 1).
- LayoutAssignment::visitMatmul: bail on dynamic shapes (matching the
  lowering) so the padded-extent cost model never divides by kDynamic.
- MaterializeTensorExtLayout: error on a function argument feeding
  secret.generic operands with conflicting materialized layouts instead
  of silently keeping the first.
- verifyLayoutRolls: reject a zero-extent rolled dim before the modulo
  (was a div-by-zero in the verifier).
- lowerMatmulByRotation: index each relation once via indexRangeByDomain
  rather than rebuilding the IntegerRelation per (i,j,k) domain point.
- lowerMatvecCtDiagonalBsgs: drop the dead 'n > Kp' gate (Kp = pow2(n)).
- LayoutAssignment: use llvm::Log2_64_Ceil instead of a hand-rolled
  ceilLog2.

heir-opt builds clean; all 28 Rotom tests pass.
Addresses two more code-review findings:

google#4 layoutNumCiphertexts overcounted a dense straddling layout. Expose the
straddle-aware inferCtPrefixLen from RotomAttributes (moved out of the
anonymous namespace, declared in the header) and route the Rotom layout
cost utilities through it, so a boundary dim that spans the ct/slot split
contributes only its high (ciphertext) part. Adds a unit test pinning the
2-ciphertext count for a 4x8 straddling layout at n=16.

google#11 the 'spaces compatible then isEqual' guard was duplicated at six sites.
Factor it into relationsCompatibleAndEqual (lib/Utils/Layout) and route the
four isRelation* pattern matchers and the two collapse/expand checks
through it. Deliberately NOT isRelationEqual: that adds an ISL sampling
(tryProveUnequal) pass that is wasted work on these one-off comparisons and
measurably slowed layout propagation.

heir-opt builds clean; layout_propagation, convert_to_ciphertext_semantics,
and the Rotom suites all pass.
Code-review finding google#6. createMaskForPoints is atomic (validates all points,
then builds one constant) but the three diagonal matvec kernels called it
*after* emitting rotate/mul ops, so an (unreachable) mask failure returned
failure() with a partially built chain left in the IR -- the caller treats
that as 'kernel not applicable' and falls back, orphaning the dead ops.

The masks are constants independent of the accumulation, so build them up
front: the BSGS and dense kernels create the single gap mask right after the
builder, and lowerMatmulByRotation pre-builds every group's mask before the
emit loop. Now the only fallible step runs before any mutation, so a failure
cannot orphan a rotation chain.

Rotom execution + lit tests pass.
Code-review finding google#13. The func-arg layout propagation hand-rolled the
block-arg -> func-arg attribute storage (dyn_cast<FunctionOpInterface> +
getArgNumber + getArgAttr/setArgAttr). The shared association layer already
implements exactly that routing (rule 1: a FunctionOpInterface block argument
maps to its arg attr), so use findAttributeAssociatedWith to read the existing
arg layout and setAttributeAssociatedWith to write it. Keeps the func-arg
targeting (only function arguments need this) and the conflict diagnostic.

No BUILD change (AttributeUtils already a dep). Rotom tests pass.
Code-review finding google#12. normalizeRowVectorMatmuls was a structural matmul
rewrite (x.W^T -> transpose(matmul(W, x^T))) hardcoded at the top of the
SeedLayout pass, which is otherwise about seeding layout attributes. Move it
into a dedicated rotom-normalize-matmuls pass so the rewrite is independently
registered, testable, and reorderable.

- New pass NormalizeMatmuls (td/h/cpp) carrying the verbatim rewrite, with
  Linalg/Tensor as dependent dialects (it creates those ops).
- SeedLayout drops the rewrite, its call, and the now-unused Linalg/Tensor
  includes + BUILD deps.
- The mlir-to-rotom-ciphertext pipeline runs rotom-normalize-matmuls before
  rotom-seed-layout; the lit test invokes it explicitly.

heir-opt builds clean; Rotom lit + execution tests pass.
Code-review finding google#9. lowerMatvecCtDiagonalBsgs and lowerMatvecDenseDiagonal
duplicated ~120 lines of baby-step/giant-step accumulation, squat residual
rotate-and-sum, and gap masking. Factor that into one emitDiagonalBsgs helper
parameterized by the contraction vector, an extract-diagonal callback, the
diagonal count, the rotation modulus, the residual limit, the gap mask, and an
optional layout tag.

The tag is a parameter rather than unified behavior on purpose: the
single-period kernel works at the full ciphertext width (numSlots == output
width) so its intermediates carry the output layout, whereas the dense kernel
works K-wide (!= the N-wide output) and must not tag them. Each kernel keeps
its own structure verification, gap-mask construction, and output placement.

Behavior-preserving: the execution test (both single-period and dense matvecs)
and the Rotom lit tests pass.
Replace the out-parameter with a small struct { size_t length; int64_t
straddleSlotExtent; } so the don't-care caller no longer declares a throwaway
local. Behavior unchanged; build + 28 tests pass.
…actor phase A)

Relation-preserving first step toward a strided layout model. The ct/slot
straddle was tracked as a StraddleRole{None,High,Low} per piece plus a global
straddleSlotExtent, special-cased in the ISL address emitter (High -> floorDiv,
Low -> mod). Replace that with a per-piece mixed-radix digit descriptor on
LayoutData: pieceDivBy/pieceModBy, where a piece reads digit = (i / divBy) mod
modBy of its tensor index. A whole-dim piece is (1, 0) => i; a straddle splits
into a ct piece (L, 0) => i/L and a slot piece (1, L) => i mod L.

This drops the StraddleRole enum and the straddle special-casing in
emitSegmentAddress/emitSplitCtSlotIsl in favor of one uniform digit rule that
also generalizes to N-way splits (groundwork for making the attribute stride the
address weight). Attribute syntax and seeding are unchanged; the emitted ISL
relations are byte-identical -- the lowering unit test, materialize lit tests,
and execution test all pass.
…ensor axis

Refactor phase B. Make the dim attribute's stride meaningful and unify the old
'repeated dim id' tiling with the straddle's mixed-radix split:

- getDim() is the tensor axis. Pieces sharing a getDim() are a mixed-radix split
  of that one axis and share its domain variable (preprocessing dedups the
  traversal entry per tensor dim).
- For a multi-piece axis, the attribute stride is the within-axis digit divisor:
  digit = (i / stride) mod extent, with strides being the cumulative products of
  the lower extents (1, e0, e0*e1, ...). Placement and the ct/slot split stay
  positional (the cumulative product of extents crosses n at the boundary).
- A single-piece axis keeps the legacy behavior (stride ignored, digit == i), so
  existing layouts -- including seeds with non-unit strides and the detected
  straddle -- are byte-identical.
- LayoutAttr::verify rejects an axis whose pieces are not a valid mixed-radix
  decomposition (e.g. duplicate strides).
- The ISL emitter now sums per piece, not per dim, so an axis can place several
  digits in one segment.

The old position-based 'tiled duplicate dim' interpretation is replaced by this:
the tiled-row-major tests keep the same (ct,slot) packing, now expressed with one
domain variable per axis. Full Rotom suite (lit/unit/execution),
convert-to-ciphertext-semantics, and layout_propagation pass.
…muls pass

Removes all Rotom matmul/diagonal machinery, keeping elementwise and
rolled-layout functionality:
- lowering kernels (lowerMatmul, diagonal/BSGS matvec) in RotomTensorOpLowering
- the convert-to-ciphertext matmul pattern branch (supportsRotomMatmul)
- matmul cost/layout/kernel-selection paths in LayoutAssignment
- the NormalizeMatmuls pass (.td/.h/.cpp + BUILD/Passes.h/heir-opt wiring)
- the dead KernelName::RotomMatmul enum value across Kernel + LayoutOptimization
- dependent matmul lit tests and the matmul lowering unit test
With the diagonal matvec layouts removed, the only producer of un-split
straddling dims is gone: SeedLayout always splits a boundary-spanning axis
into an explicit ct piece + slot piece. So inferCtPrefixLen no longer needs
to detect a straddle and auto-insert a slot piece during preprocessing.

- inferCtPrefixLen returns size_t (the CtPrefix struct collapses to its one
  remaining field); a dim that does not fit the slot budget simply stays on
  the ciphertext axis.
- preprocessLayoutData drops the auto-split branch; boundary-spanning axes
  must be expressed as explicit mixed-radix splits.
- layoutNumCiphertexts counts the ct prefix directly, no straddle divide.
- Migrate the straddle unit test to an explicit mixed-radix split.
A roll(i, j) shifts one dim by the other modulo their shared extent, so the
two rolled dims must have equal extents (only the extents must match; strides
may differ). Tighten the verifier accordingly and drop the dead non-zero
extent check (DimAttr::verify already guarantees size > 0).
Rename LayoutData::pieceDivBy -> pieceStride and pieceModBy -> pieceExtent to
match Rotom's [dim:extent:stride] notation. pieceExtent now always holds the
piece's true extent (never a 0 sentinel): the "drop the redundant modulus on
the most-significant digit" optimization moves from a preprocessing sentinel to
an emitter-side check (stride * extent < the axis full extent). Byte-identical
ISL output; also simplifies preprocessLayoutData.
…ivalence

Reverts the pieceStride/pieceExtent rename. Since the sibling field is the
positional address coeff (not a Rotom dim), the digit descriptor keeps the
implementation-oriented divBy/modBy names (and the 0 = no-modulus sentinel),
with a comment noting they are the piece Rotom stride/extent.
edwjchen added 30 commits June 22, 2026 17:08
Separate the two intertwined concerns: generateCandidates() runs a forward
pre-order walk that fills the candidate map for every value (returns skipped),
then selectLayouts() runs the backward search from each return over the now-
complete candidate space. visitReturn moves out of the generation TypeSwitch
into the search phase. Behavior-preserving (the candidate map is the one-way
handoff); method declarations are grouped into generation vs search, with the
generation visitors kept self-contained to ease moving each op to its own file
as the kernel space grows.
It has no callers outside LayoutAlignment.cpp and only adapts a LayoutAttr to
the (dims, n) core in RotomAttributes. Drop the public header declaration and
mark the definition static so there is one public inferCtPrefixLen, not a
same-named overload across two headers.
…ridge)

Stage 3a of wiring conversions to the shift network: shiftNetworkConversionCost
bridges two rotom layouts to tensor_ext (the same lowerToTensorExtIsl the
materializer uses) and calls computeCostOfLayoutConversion, which composes
inverse(from) o to into a (ct,slot) mapping, finds a Vos-Vos-Erkin shift scheme,
and counts rotations. This is the real cost that conversionMoves only proxied;
conversionMoves stays the cheap aligned fast-path. Additive -- not yet wired
into the search loop (that needs caching + expected test-churn).
…earch

Stage 3b: chooseElementwiseKernels now scores conversions with the real
Vos-Vos-Erkin rotation count (cachedConversionCost) instead of the move-count
proxy. cachedConversionCost gates on the conversionMoves(...) aligned fast path
(0, no VVE), caches by layout pair, and falls back to the proxy when a layout
cannot be lowered to tensor_ext. Adds tensor_ext to getDependentDialects.

Also fixes a latent bug this exposed in the shared cost model: the
RotationCountVisitor in LayoutConversionCost.cpp asserted on SplatNode (a
broadcast constant) -- it now counts it as 0 rotations like the other leaves.

Re-blesses layout_assignment.mlir: two elementwise functions select different
(still-valid) layouts under the real cost; the pipeline-execution tests confirm
the selections remain lowerable and numerically correct.
withRolls / enumerateSingleRolls were only exercised by unit tests -- the layout
search never enumerates rolls, so the rolled-candidate path is unverified
end-to-end. Remove them and their tests. The roll *lowering* in the emitter
stays (materialize_rolled_row_major and the rolled pipeline/evaluate tests still
pass). Also note on supportsRotomAlignmentLowering that it is elementwise-
specific and should be decomposed per tensor operator as other ops gain compute
kernels with different alignment goals.
Keep the behavioral pipeline tests -- the end-to-end RotomPipelineExecutionTest
(runs seed/assign/materialize/convert + checks numerics) and the materialize /
IR / seed lit tests. Remove the churny ones that pin exact selected layouts or
lowering IR: the LayoutAlignmentTest and RotomTensorExtLayoutLoweringTest C++
unit tests, and layout_assignment.mlir / layout_assignment_rotom_elementwise_
lowering.mlir. Removed for now; correctness is covered end-to-end.
Replace the per-element mask/remap accumulation loop in lowerElementwiseBinary
with the convert-then-compute shape: convert each operand to the shared output
layout via tensor_ext.convert_layout (a no-op when already aligned), then one
elementwise op at that layout. Add implement-shift-network to the rotom pipeline
(heir-opt) and the execution-test harness to lower the conversions into rotations
+ masks. Remove the now-dead align/mask/remap helpers. Execution test confirms
numeric correctness end-to-end.
Revert the test trim (2497f20): all four removed tests
(LayoutAlignmentTest, RotomTensorExtLayoutLoweringTest,
layout_assignment.mlir, layout_assignment_rotom_elementwise_lowering.mlir)
predate this work and are kept. Re-bless the elementwise lowering test to
the convert-then-compute model: func-arg operands are unified onto one
layout by the materializer, so the lowering is a bare elementwise op with
no conversion.

Also restore the original roll verifier message
("rolled dims must have the same extent (size)").
…phase 1)

Extract the op-independent free functions out of the 1080-line pass file
into two cohesive sibling units (added to the same LayoutAssignment library):

  - Candidate.{h,cpp}: candidate vocabulary (KernelKind, Candidate), arith-op
    classification, the cost model, and ranking/dedup (isBetterCandidate,
    uniqueCandidates).
  - DimMaps.{h,cpp}: the pure per-op dim maps (reduction/collapse/expand/
    extract/insert-slice) plus remapLayoutDims and the candidate-generation
    combinators (remapCandidates, chooseCommonCandidates).

LayoutAssignment.cpp keeps only the value classification helpers and the pass
itself, dropping from 1080 to 676 lines. No behavior change: the full Rotom
suite (incl. the layout_assignment.mlir layout pins and the pipeline
execution test) passes unchanged.
Subtraction is an additive, depth-0 elementwise op, so it reuses the RotomAdd
kernel (both operands at the shared layout); only the emitted arith op differs.

  - Utils: add makeAppropriatelyTypedSubOp (arith.subf/subi by element type).
  - selectRotomElementwiseKernel: map isAddLike (add + sub) -> RotomAdd.
  - RotomTensorOpLowering: emit subf/subi for SubF/SubI ops.
  - ConvertToCiphertextSemantics: register ConvertRotomElementwiseBinary for
    SubFOp/SubIOp and treat them as RotomAdd in expectedKernel().

The assignment dispatch already routed SubF/SubI to visitElementwise. Verified
numerically (execution test now checks a - b) and at the IR level (lit test).
Replace the backward greedy selection with a forward cost-DP whose accumulated
cost is correct on DAGs. Each Candidate now carries a `cone`: the chosen layout
and local cost of every value feeding it. Because the cone is keyed by Value, a
shared subexpression appears once, so `cost` (the sum over the cone) never
double-counts -- it is the true total over the candidate's assignment map.

Combining operands merges their cones and drops any pairing whose
sub-assignments disagree on a shared value, so cones stay internally consistent.
Selection is then just: at each function's returns, fold in the cheapest
consistent cone; that cone is the output assignment (no markSelected backward
pass). Kernels are re-derived from the final layouts.

On trees the cone cost equals the previous accumulated cost, so the existing
structural/elementwise tests are unchanged; a new diamond execution test covers
the shared-value DAG path. Pruning keeps min-cost per layout (exact on trees and
shallow reconvergence).
…ost"

Drop the invented "cone" vocabulary. A Candidate now carries:
  - `assignment` (was `cone`): the layout chosen for every value feeding it,
    with each kernel's local cost;
  - `accumulatedCost` (was `cost`): the dedup'd sum over that assignment.
Helpers follow: coneCost -> accumulatedCostOf, mergeCones -> mergeAssignments.
Pure rename + comment cleanup; no behavior change (24/24 Rotom tests pass).
The file-local helpers are already `static`, which gives internal linkage, so
the wrapping anonymous namespace was redundant. Drop it, keeping `static` (the
LLVM-preferred form for free functions; reserve anonymous namespaces for types).
No behavior change.
Replace the hand-rolled raw_string_ostream wrappers with MLIR's debugString
helper: drop layoutKey entirely (candidateTieKey calls debugString directly)
and reduce kernelKey to a one-liner. No behavior change.
Replace the layout-assignment cost estimates with a single cost model whose
units are commensurate, so conversions and compute can be summed meaningfully:

  - CostModel.{h,cpp} + cost_model.json: relative HE op-cost weights in
    key-switch units (rotation = ctMultiply = 100, add = 1). Defaults mirror the
    JSON; set ROTOM_COST_MODEL=<path> to override at runtime (temporary hook).
  - cachedConversionCost now returns the real Vos-Vos-Erkin rotation count
    weighted by the rotation cost (the proxy is only the unlowerable fallback).
  - operationCost: add -> add*numCt, mul -> ciphertextMultiply*numCt (was an
    ad-hoc 1x / 10x).
  - Drop the layoutConversionCost proxy; chooseCommonCandidates takes a
    conversion-cost callback so the structural ops use the real cached cost too.

Re-blessed layout_assignment.mlir (one selection shifted under the new weights);
execution numerics unchanged. 24/24 Rotom tests pass.
… contract

operationCost/genericOperationCost already route through the cost model and are
computed from the aligned (compute) layout the operands are converted to; make
that contract explicit (rename the param to alignedLayout, document it).

Fix the one op that wasn't on the model or the input layout: a reduction's local
cost was the *output* (reduced) ciphertext count, unweighted. A reduce sums its
input ciphertexts, so charge numCt(inputLayout) * add instead. No existing
selection shifts; 24/24 Rotom tests pass.
…lback

1. Remove the runtime cost-model loading: getCostModel() now just returns the
   compiled-in defaults (CostModel.cpp no longer reads ROTOM_COST_MODEL or
   parses JSON). cost_model.json stays as documentation of those defaults.
2. The shift network always produces a cost for assignable layouts (the search
   only generates materializable ones), so cachedConversionCost asserts the VVE
   result instead of falling back to the move-count proxy. Drop the now-dead
   conversionCost(moves).

No behavior change; 24/24 Rotom tests pass.
FHE packing requires static shapes (a tensor is laid out into a fixed ciphertext
slot count), so a dynamic dimension can never reach layout assignment. Remove the
isDynamic helper and its four guards in the per-op dim-map computers. No behavior
change.
The per-op dim maps are total over the input's dims (every dim resolves to a
result index or -1), and value-compatible layouts only reference in-range dims,
so remapLayoutDims can never fail. Drop the dead unmapped-sentinel (-2) branch,
assert the in-range invariant, and return a plain LayoutAttr; remapCandidates no
longer threads an optional. No behavior change.
Replace the shared group comment over getReduction/Collapse/Expand/Extract/
InsertSliceDimMap with a one-line docstring per function naming the op it maps
and how it relabels dims.
…e of truth

The JSON was no longer loaded (only documented the defaults), so drop it and
tighten the struct comment.
Its single file-local helper is now static (LLVM-preferred form for free
functions), matching the RotomAttributes.cpp cleanup.
Sweep the no-type anonymous namespaces in the files touched this session and
convert to the LLVM-preferred form (static for free functions/variables,
reserving anonymous namespaces for class declarations):
  - LayoutAssignment.cpp: the two attr-name constexprs (already internal).
  - LayoutAlignment.cpp: atomizeSlotRegion -> static.
  - RotomTensorOpLowering.cpp: static alias refs + setMaterializedAttr; usings
    move to file scope.

Left as-is: LayoutAssignment.cpp / SeedLayout.cpp anonymous namespaces wrap the
pass struct (the LLVM-recommended use), and the foundational/test files.
The tiled layout [[0:2:2],[1:2:2],[0:2:1],[1:2:1]] describes a 4x4 tensor
(logical dims 0 and 1 each split into a high/low digit, extent 4), packed into
2 ciphertexts of 8 slots. The test wrongly applied it to tensor<2x2x2x2>, whose
size-2 dims made the layout look incompatible and get silently dropped. With the
correct tensor<4x4>, the layout flows through assignment and materialization
(arg -> tensor<2x16>) and the add stays a plain ciphertext add -- still not
Rotom-lowered, since the tiled layout has non-unit strides.
The bazel-<worktree> convenience symlink (bazel-rotom-rolls-phase1) was
accidentally committed in 8bf3927 because .gitignore only listed the main
workspace's symlinks. Untrack it and generalize the ignore to /bazel-*.
…atchers

93b5624 added an isCompatible guard and deduped it into
relationsCompatibleAndEqual across the named-layout matchers
(isRelationSquatDiagonal/RowMajor/PerRow/Bicyclic) and the collapse/expand-shape
checks. That was motivated by the Rotom diagonal path, which has since been
removed, so the guard is vestigial. Restore origin/main's direct
relation.isEqual(...) and drop the helper. 26 convert-to-ciphertext lit tests
plus the Rotom/Layout suites pass.
Refactor phase A of the LayoutAssignment split. Each of the 12 per-op
candidate visitors (visitFunc/visitElementwise/visitTranspose/... ) becomes a
free static generate*(AssignmentContext&, XOp) function; visitOperation
dispatches to them. The pass implements the new narrow AssignmentContext
interface -- the seed/candidate/cost API the generators call back into -- so
each generator depends only on that, not on the pass struct. This is the
decoupling that lets the generators move to per-category files next.

Also tighten the file-local anonymous namespace to wrap only the
LayoutAssignment struct; the free helpers and generators are now plain static
functions per the LLVM convention.

Behavior-preserving: builds clean, 22/22 Rotom lit + execution tests pass.
Move getPlainValueType / isTensorLike / isLayoutCompatibleWithValue out of
LayoutAssignment.cpp into a small ValueUtils.{h,cpp}. These are the shared
value-type predicates the per-op generators rely on; giving them a header lets
the generators move to their own files (refactor phase C) without depending on
the pass translation unit. Behavior-preserving: 22/22 Rotom tests pass.
Refactor phase C. The 12 generate* functions now live in four op-family files
under LayoutAssignment/gen/ -- Structural (func/secret.generic/yield/
passthrough), Elementwise (add/sub/mul + linalg.generic), ReduceTranspose, and
Reshape (collapse/expand/extract/insert) -- all declared in Generators.h. The
linalg-body predicates (isElementwiseGeneric/hasAddLikeBody) move with the
elementwise generator.

LayoutAssignment.cpp drops from ~680 to ~420 lines: it now holds the pass
struct, the AssignmentContext implementation, the op dispatch, and the
selection phase. Adding a new tensor op is now: a generator in the relevant
gen/ file + a Generators.h declaration + one dispatch line.

Behavior-preserving: builds clean, 22/22 Rotom lit + execution tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant