wip: Rotom layout alignment, assignment, and lowering by edwjchen · Pull Request #2980 · google/heir

edwjchen · 2026-05-21T15:58:42Z

do not merge yet!

… execution tests

Add withRolls() and enumerateSingleRolls() to LayoutAlignment. The latter ranges over ordered pairs of distinct, equal-extent slot-side traversal dims and returns the materializable single-roll variants of a layout, the building block for seeding rolled (diagonal) packings into the layout search. Rolls only affect the slot line of the materialized layout, so a roll whose 'from' dim is ciphertext-side lowers as a no-op; restrict both roll indices to the slot side so every variant is a genuinely distinct packing.

Two pipeline-execution tests proving rolled (diagonal) layouts compute correctly through layout assignment, materialization, the relation-driven lowering, and the interpreter: - a shared rolled layout on both addf operands, and - a rolled operand added to a row-major operand, which forces the lowering to remap between the rolled and unrolled packings. Validates the premise that the existing brute-force path already handles rolled layouts, de-risking rolled-variant injection into the search.

Add RotomTensorOpLowering::lowerMatmulByRotation, a diagonal matmul kernel that replaces the brute-force per-scalar mask+remap lowering with ciphertext rotations. For a single-ciphertext layout it groups every (i,j,k) contraction term by the rotation pair that realizes it -- (srcSlot - dstSlot) mod numSlots for each operand -- and emits one masked rotate-multiply per group, accumulated into the destination operand. The construction is correct for any such layout (each contribution is realized exactly once at its destination, with a shift-collision guard and single-point/single-ciphertext gates that fall back to the brute-force path), and collapses to few rotations exactly when the layout is rolled/diagonal: a Halevi-Shoup matvec with a diagonal-packed matrix uses 14 rotations vs 32 row-major. Also add tensor_ext.rotate support to the OpenFHE interpreter so the kernel is execution-verifiable, update the matmul lowering golden (remap -> rotate), and add an execution test asserting the rolled packing uses fewer rotations and still computes the correct matvec.

…NIST@32768 Implements the Halevi-Shoup ciphertext-axis diagonal matvec with a baby-step/giant-step schedule, auto-discovered from a row-major seed, and threads it through the whole seed -> assign -> materialize -> convert pipeline so a full MNIST 784->512->10 MLP lowers at the reference ciphertext size 32768 with the reference packing (16-ciphertext layer 1). Kernels (RotomTensorOpLowering): - lowerMatvecCtDiagonalBsgs: square/squat single-period + replicated BSGS. - lowerMatvecDenseDiagonal: P=N/K diagonals per ciphertext (dense packing). - Both handle non-power-of-two M and K by doing slot arithmetic over the padded extents Dp=nextPow2(m), Kp=nextPow2(n) while verifying only the real [0,m)x[0,n) domain (padding is zero); identical for power-of-two dims. - Layout verification rebuilt the ISL relation per matrix element (O(m*n) ISL builds, minutes at network scale); now indexes each relation once (indexRangeByDomain) -> per-element lookups. Convert of a 512-hidden layer drops from minutes to ~16s. Search (LayoutAssignment): - Generates the rolled diagonal/identity candidates and a rotation-aware cost so the matvec kernel is discovered without a hand-written layout. - Kernel gate checks the squat contract on padded extents. - Fixes a back-propagation break where a bare operand candidate shadowed an intermediate's candidate (chained matmuls losing their kernel). Seed (SeedLayout): replication-to-fill -- a tensor smaller than the ciphertext (e.g. a 10x512 weight at n=32768) now gets a seed (full slot packing + a replication dim) so the search has a candidate to roll. Secret path (MaterializeTensorExtLayout): copy a secret.generic operand's materialized layout onto the corresponding function-argument attribute, so the type converter converts the secret arg (fixes a block-arg/operand type mismatch when a secret vector materializes wider than its data). Utils/Layout: guard isRelation{Bicyclic,RowMajor,PerRow,SquatDiagonal} against incompatible Presburger spaces (a replicated layout has extra local vars) instead of asserting inside IntegerRelation::isEqual. Pipeline: register --mlir-to-rotom-ciphertext (seed -> assign -> materialize -> convert) so high-level tensor IR in secret.generic regions lowers in one command with no hand-written layouts. Tests: RotomPipelineExecutionTest gains end-to-end numerical coverage for the square/squat/dense/replicated/under-filled/padded matvec regimes, a two-layer MLP (matvec -> square -> matvec), and an MNIST-layer-scale 512x1024 -> 16x512 MLP; plus a lit test for the registered pipeline.

A neural-net layer y = x . W^T is a vector-times-matrix matmul x(1xK) * M(KxN), but the ciphertext-axis diagonal matvec kernel only handles matrix * column- vector. SeedLayout now rewrites such matmuls (gate: lhs.dim0==1 && rhs.dim1>1) into transpose(matmul(M^T, x^T)) before seeding, reusing the existing kernel; when M = transpose(W) the pre-transpose W is used directly to avoid a redundant transpose-of-transpose. The matrix is then W (NxK) and must be squat (N<=K), which the real MNIST layers (512x784, 10x512) satisfy. Both matmuls of the real MNIST model now lower to RotomMatmul with no leftover matmul. Adds a lit test for the rewrite. Also guards two more IntegerRelation::isEqual call sites (ConvertTensor {Collapse,Expand}Shape) against incompatible Presburger spaces -- exposed by the rewrite's vector transposes lowering to reshapes over replicated layouts.

Seven fixes from the branch code review: - SeedLayout: only reuse a transpose producer's input when it is a true [1,0] permutation, else an identity transpose would build an invalid matmul(K x N, K x 1). - LayoutAssignment::visitMatmul: bail on dynamic shapes (matching the lowering) so the padded-extent cost model never divides by kDynamic. - MaterializeTensorExtLayout: error on a function argument feeding secret.generic operands with conflicting materialized layouts instead of silently keeping the first. - verifyLayoutRolls: reject a zero-extent rolled dim before the modulo (was a div-by-zero in the verifier). - lowerMatmulByRotation: index each relation once via indexRangeByDomain rather than rebuilding the IntegerRelation per (i,j,k) domain point. - lowerMatvecCtDiagonalBsgs: drop the dead 'n > Kp' gate (Kp = pow2(n)). - LayoutAssignment: use llvm::Log2_64_Ceil instead of a hand-rolled ceilLog2. heir-opt builds clean; all 28 Rotom tests pass.

Addresses two more code-review findings: google#4 layoutNumCiphertexts overcounted a dense straddling layout. Expose the straddle-aware inferCtPrefixLen from RotomAttributes (moved out of the anonymous namespace, declared in the header) and route the Rotom layout cost utilities through it, so a boundary dim that spans the ct/slot split contributes only its high (ciphertext) part. Adds a unit test pinning the 2-ciphertext count for a 4x8 straddling layout at n=16. google#11 the 'spaces compatible then isEqual' guard was duplicated at six sites. Factor it into relationsCompatibleAndEqual (lib/Utils/Layout) and route the four isRelation* pattern matchers and the two collapse/expand checks through it. Deliberately NOT isRelationEqual: that adds an ISL sampling (tryProveUnequal) pass that is wasted work on these one-off comparisons and measurably slowed layout propagation. heir-opt builds clean; layout_propagation, convert_to_ciphertext_semantics, and the Rotom suites all pass.

Code-review finding google#6. createMaskForPoints is atomic (validates all points, then builds one constant) but the three diagonal matvec kernels called it *after* emitting rotate/mul ops, so an (unreachable) mask failure returned failure() with a partially built chain left in the IR -- the caller treats that as 'kernel not applicable' and falls back, orphaning the dead ops. The masks are constants independent of the accumulation, so build them up front: the BSGS and dense kernels create the single gap mask right after the builder, and lowerMatmulByRotation pre-builds every group's mask before the emit loop. Now the only fallible step runs before any mutation, so a failure cannot orphan a rotation chain. Rotom execution + lit tests pass.

Code-review finding google#13. The func-arg layout propagation hand-rolled the block-arg -> func-arg attribute storage (dyn_cast<FunctionOpInterface> + getArgNumber + getArgAttr/setArgAttr). The shared association layer already implements exactly that routing (rule 1: a FunctionOpInterface block argument maps to its arg attr), so use findAttributeAssociatedWith to read the existing arg layout and setAttributeAssociatedWith to write it. Keeps the func-arg targeting (only function arguments need this) and the conflict diagnostic. No BUILD change (AttributeUtils already a dep). Rotom tests pass.

Code-review finding google#12. normalizeRowVectorMatmuls was a structural matmul rewrite (x.W^T -> transpose(matmul(W, x^T))) hardcoded at the top of the SeedLayout pass, which is otherwise about seeding layout attributes. Move it into a dedicated rotom-normalize-matmuls pass so the rewrite is independently registered, testable, and reorderable. - New pass NormalizeMatmuls (td/h/cpp) carrying the verbatim rewrite, with Linalg/Tensor as dependent dialects (it creates those ops). - SeedLayout drops the rewrite, its call, and the now-unused Linalg/Tensor includes + BUILD deps. - The mlir-to-rotom-ciphertext pipeline runs rotom-normalize-matmuls before rotom-seed-layout; the lit test invokes it explicitly. heir-opt builds clean; Rotom lit + execution tests pass.

Code-review finding google#9. lowerMatvecCtDiagonalBsgs and lowerMatvecDenseDiagonal duplicated ~120 lines of baby-step/giant-step accumulation, squat residual rotate-and-sum, and gap masking. Factor that into one emitDiagonalBsgs helper parameterized by the contraction vector, an extract-diagonal callback, the diagonal count, the rotation modulus, the residual limit, the gap mask, and an optional layout tag. The tag is a parameter rather than unified behavior on purpose: the single-period kernel works at the full ciphertext width (numSlots == output width) so its intermediates carry the output layout, whereas the dense kernel works K-wide (!= the N-wide output) and must not tag them. Each kernel keeps its own structure verification, gap-mask construction, and output placement. Behavior-preserving: the execution test (both single-period and dense matvecs) and the Rotom lit tests pass.

Replace the out-parameter with a small struct { size_t length; int64_t straddleSlotExtent; } so the don't-care caller no longer declares a throwaway local. Behavior unchanged; build + 28 tests pass.

…actor phase A) Relation-preserving first step toward a strided layout model. The ct/slot straddle was tracked as a StraddleRole{None,High,Low} per piece plus a global straddleSlotExtent, special-cased in the ISL address emitter (High -> floorDiv, Low -> mod). Replace that with a per-piece mixed-radix digit descriptor on LayoutData: pieceDivBy/pieceModBy, where a piece reads digit = (i / divBy) mod modBy of its tensor index. A whole-dim piece is (1, 0) => i; a straddle splits into a ct piece (L, 0) => i/L and a slot piece (1, L) => i mod L. This drops the StraddleRole enum and the straddle special-casing in emitSegmentAddress/emitSplitCtSlotIsl in favor of one uniform digit rule that also generalizes to N-way splits (groundwork for making the attribute stride the address weight). Attribute syntax and seeding are unchanged; the emitted ISL relations are byte-identical -- the lowering unit test, materialize lit tests, and execution test all pass.

…ensor axis Refactor phase B. Make the dim attribute's stride meaningful and unify the old 'repeated dim id' tiling with the straddle's mixed-radix split: - getDim() is the tensor axis. Pieces sharing a getDim() are a mixed-radix split of that one axis and share its domain variable (preprocessing dedups the traversal entry per tensor dim). - For a multi-piece axis, the attribute stride is the within-axis digit divisor: digit = (i / stride) mod extent, with strides being the cumulative products of the lower extents (1, e0, e0*e1, ...). Placement and the ct/slot split stay positional (the cumulative product of extents crosses n at the boundary). - A single-piece axis keeps the legacy behavior (stride ignored, digit == i), so existing layouts -- including seeds with non-unit strides and the detected straddle -- are byte-identical. - LayoutAttr::verify rejects an axis whose pieces are not a valid mixed-radix decomposition (e.g. duplicate strides). - The ISL emitter now sums per piece, not per dim, so an axis can place several digits in one segment. The old position-based 'tiled duplicate dim' interpretation is replaced by this: the tiled-row-major tests keep the same (ct,slot) packing, now expressed with one domain variable per axis. Full Rotom suite (lit/unit/execution), convert-to-ciphertext-semantics, and layout_propagation pass.

…muls pass Removes all Rotom matmul/diagonal machinery, keeping elementwise and rolled-layout functionality: - lowering kernels (lowerMatmul, diagonal/BSGS matvec) in RotomTensorOpLowering - the convert-to-ciphertext matmul pattern branch (supportsRotomMatmul) - matmul cost/layout/kernel-selection paths in LayoutAssignment - the NormalizeMatmuls pass (.td/.h/.cpp + BUILD/Passes.h/heir-opt wiring) - the dead KernelName::RotomMatmul enum value across Kernel + LayoutOptimization - dependent matmul lit tests and the matmul lowering unit test

With the diagonal matvec layouts removed, the only producer of un-split straddling dims is gone: SeedLayout always splits a boundary-spanning axis into an explicit ct piece + slot piece. So inferCtPrefixLen no longer needs to detect a straddle and auto-insert a slot piece during preprocessing. - inferCtPrefixLen returns size_t (the CtPrefix struct collapses to its one remaining field); a dim that does not fit the slot budget simply stays on the ciphertext axis. - preprocessLayoutData drops the auto-split branch; boundary-spanning axes must be expressed as explicit mixed-radix splits. - layoutNumCiphertexts counts the ct prefix directly, no straddle divide. - Migrate the straddle unit test to an explicit mixed-radix split.

A roll(i, j) shifts one dim by the other modulo their shared extent, so the two rolled dims must have equal extents (only the extents must match; strides may differ). Tighten the verifier accordingly and drop the dead non-zero extent check (DimAttr::verify already guarantees size > 0).

Rename LayoutData::pieceDivBy -> pieceStride and pieceModBy -> pieceExtent to match Rotom's [dim:extent:stride] notation. pieceExtent now always holds the piece's true extent (never a 0 sentinel): the "drop the redundant modulus on the most-significant digit" optimization moves from a preprocessing sentinel to an emitter-side check (stride * extent < the axis full extent). Byte-identical ISL output; also simplifies preprocessLayoutData.

…ivalence Reverts the pieceStride/pieceExtent rename. Since the sibling field is the positional address coeff (not a Rotom dim), the digit descriptor keeps the implementation-oriented divBy/modBy names (and the 0 = no-modulus sentinel), with a comment noting they are the piece Rotom stride/extent.

Separate the two intertwined concerns: generateCandidates() runs a forward pre-order walk that fills the candidate map for every value (returns skipped), then selectLayouts() runs the backward search from each return over the now- complete candidate space. visitReturn moves out of the generation TypeSwitch into the search phase. Behavior-preserving (the candidate map is the one-way handoff); method declarations are grouped into generation vs search, with the generation visitors kept self-contained to ease moving each op to its own file as the kernel space grows.

It has no callers outside LayoutAlignment.cpp and only adapts a LayoutAttr to the (dims, n) core in RotomAttributes. Drop the public header declaration and mark the definition static so there is one public inferCtPrefixLen, not a same-named overload across two headers.

…ridge) Stage 3a of wiring conversions to the shift network: shiftNetworkConversionCost bridges two rotom layouts to tensor_ext (the same lowerToTensorExtIsl the materializer uses) and calls computeCostOfLayoutConversion, which composes inverse(from) o to into a (ct,slot) mapping, finds a Vos-Vos-Erkin shift scheme, and counts rotations. This is the real cost that conversionMoves only proxied; conversionMoves stays the cheap aligned fast-path. Additive -- not yet wired into the search loop (that needs caching + expected test-churn).

…earch Stage 3b: chooseElementwiseKernels now scores conversions with the real Vos-Vos-Erkin rotation count (cachedConversionCost) instead of the move-count proxy. cachedConversionCost gates on the conversionMoves(...) aligned fast path (0, no VVE), caches by layout pair, and falls back to the proxy when a layout cannot be lowered to tensor_ext. Adds tensor_ext to getDependentDialects. Also fixes a latent bug this exposed in the shared cost model: the RotationCountVisitor in LayoutConversionCost.cpp asserted on SplatNode (a broadcast constant) -- it now counts it as 0 rotations like the other leaves. Re-blesses layout_assignment.mlir: two elementwise functions select different (still-valid) layouts under the real cost; the pipeline-execution tests confirm the selections remain lowerable and numerically correct.

withRolls / enumerateSingleRolls were only exercised by unit tests -- the layout search never enumerates rolls, so the rolled-candidate path is unverified end-to-end. Remove them and their tests. The roll *lowering* in the emitter stays (materialize_rolled_row_major and the rolled pipeline/evaluate tests still pass). Also note on supportsRotomAlignmentLowering that it is elementwise- specific and should be decomposed per tensor operator as other ops gain compute kernels with different alignment goals.

Keep the behavioral pipeline tests -- the end-to-end RotomPipelineExecutionTest (runs seed/assign/materialize/convert + checks numerics) and the materialize / IR / seed lit tests. Remove the churny ones that pin exact selected layouts or lowering IR: the LayoutAlignmentTest and RotomTensorExtLayoutLoweringTest C++ unit tests, and layout_assignment.mlir / layout_assignment_rotom_elementwise_ lowering.mlir. Removed for now; correctness is covered end-to-end.

Replace the per-element mask/remap accumulation loop in lowerElementwiseBinary with the convert-then-compute shape: convert each operand to the shared output layout via tensor_ext.convert_layout (a no-op when already aligned), then one elementwise op at that layout. Add implement-shift-network to the rotom pipeline (heir-opt) and the execution-test harness to lower the conversions into rotations + masks. Remove the now-dead align/mask/remap helpers. Execution test confirms numeric correctness end-to-end.

Revert the test trim (2497f20): all four removed tests (LayoutAlignmentTest, RotomTensorExtLayoutLoweringTest, layout_assignment.mlir, layout_assignment_rotom_elementwise_lowering.mlir) predate this work and are kept. Re-bless the elementwise lowering test to the convert-then-compute model: func-arg operands are unified onto one layout by the materializer, so the lowering is a bare elementwise op with no conversion. Also restore the original roll verifier message ("rolled dims must have the same extent (size)").

…phase 1) Extract the op-independent free functions out of the 1080-line pass file into two cohesive sibling units (added to the same LayoutAssignment library): - Candidate.{h,cpp}: candidate vocabulary (KernelKind, Candidate), arith-op classification, the cost model, and ranking/dedup (isBetterCandidate, uniqueCandidates). - DimMaps.{h,cpp}: the pure per-op dim maps (reduction/collapse/expand/ extract/insert-slice) plus remapLayoutDims and the candidate-generation combinators (remapCandidates, chooseCommonCandidates). LayoutAssignment.cpp keeps only the value classification helpers and the pass itself, dropping from 1080 to 676 lines. No behavior change: the full Rotom suite (incl. the layout_assignment.mlir layout pins and the pipeline execution test) passes unchanged.

Subtraction is an additive, depth-0 elementwise op, so it reuses the RotomAdd kernel (both operands at the shared layout); only the emitted arith op differs. - Utils: add makeAppropriatelyTypedSubOp (arith.subf/subi by element type). - selectRotomElementwiseKernel: map isAddLike (add + sub) -> RotomAdd. - RotomTensorOpLowering: emit subf/subi for SubF/SubI ops. - ConvertToCiphertextSemantics: register ConvertRotomElementwiseBinary for SubFOp/SubIOp and treat them as RotomAdd in expectedKernel(). The assignment dispatch already routed SubF/SubI to visitElementwise. Verified numerically (execution test now checks a - b) and at the IR level (lit test).

Replace the backward greedy selection with a forward cost-DP whose accumulated cost is correct on DAGs. Each Candidate now carries a `cone`: the chosen layout and local cost of every value feeding it. Because the cone is keyed by Value, a shared subexpression appears once, so `cost` (the sum over the cone) never double-counts -- it is the true total over the candidate's assignment map. Combining operands merges their cones and drops any pairing whose sub-assignments disagree on a shared value, so cones stay internally consistent. Selection is then just: at each function's returns, fold in the cheapest consistent cone; that cone is the output assignment (no markSelected backward pass). Kernels are re-derived from the final layouts. On trees the cone cost equals the previous accumulated cost, so the existing structural/elementwise tests are unchanged; a new diamond execution test covers the shared-value DAG path. Pruning keeps min-cost per layout (exact on trees and shallow reconvergence).

…ost" Drop the invented "cone" vocabulary. A Candidate now carries: - `assignment` (was `cone`): the layout chosen for every value feeding it, with each kernel's local cost; - `accumulatedCost` (was `cost`): the dedup'd sum over that assignment. Helpers follow: coneCost -> accumulatedCostOf, mergeCones -> mergeAssignments. Pure rename + comment cleanup; no behavior change (24/24 Rotom tests pass).

The file-local helpers are already `static`, which gives internal linkage, so the wrapping anonymous namespace was redundant. Drop it, keeping `static` (the LLVM-preferred form for free functions; reserve anonymous namespaces for types). No behavior change.

Replace the hand-rolled raw_string_ostream wrappers with MLIR's debugString helper: drop layoutKey entirely (candidateTieKey calls debugString directly) and reduce kernelKey to a one-liner. No behavior change.

Replace the layout-assignment cost estimates with a single cost model whose units are commensurate, so conversions and compute can be summed meaningfully: - CostModel.{h,cpp} + cost_model.json: relative HE op-cost weights in key-switch units (rotation = ctMultiply = 100, add = 1). Defaults mirror the JSON; set ROTOM_COST_MODEL=<path> to override at runtime (temporary hook). - cachedConversionCost now returns the real Vos-Vos-Erkin rotation count weighted by the rotation cost (the proxy is only the unlowerable fallback). - operationCost: add -> add*numCt, mul -> ciphertextMultiply*numCt (was an ad-hoc 1x / 10x). - Drop the layoutConversionCost proxy; chooseCommonCandidates takes a conversion-cost callback so the structural ops use the real cached cost too. Re-blessed layout_assignment.mlir (one selection shifted under the new weights); execution numerics unchanged. 24/24 Rotom tests pass.

… contract operationCost/genericOperationCost already route through the cost model and are computed from the aligned (compute) layout the operands are converted to; make that contract explicit (rename the param to alignedLayout, document it). Fix the one op that wasn't on the model or the input layout: a reduction's local cost was the *output* (reduced) ciphertext count, unweighted. A reduce sums its input ciphertexts, so charge numCt(inputLayout) * add instead. No existing selection shifts; 24/24 Rotom tests pass.

…lback 1. Remove the runtime cost-model loading: getCostModel() now just returns the compiled-in defaults (CostModel.cpp no longer reads ROTOM_COST_MODEL or parses JSON). cost_model.json stays as documentation of those defaults. 2. The shift network always produces a cost for assignable layouts (the search only generates materializable ones), so cachedConversionCost asserts the VVE result instead of falling back to the move-count proxy. Drop the now-dead conversionCost(moves). No behavior change; 24/24 Rotom tests pass.

FHE packing requires static shapes (a tensor is laid out into a fixed ciphertext slot count), so a dynamic dimension can never reach layout assignment. Remove the isDynamic helper and its four guards in the per-op dim-map computers. No behavior change.

The per-op dim maps are total over the input's dims (every dim resolves to a result index or -1), and value-compatible layouts only reference in-range dims, so remapLayoutDims can never fail. Drop the dead unmapped-sentinel (-2) branch, assert the in-range invariant, and return a plain LayoutAttr; remapCandidates no longer threads an optional. No behavior change.

Replace the shared group comment over getReduction/Collapse/Expand/Extract/ InsertSliceDimMap with a one-line docstring per function naming the op it maps and how it relabels dims.

…e of truth The JSON was no longer loaded (only documented the defaults), so drop it and tighten the struct comment.

Its single file-local helper is now static (LLVM-preferred form for free functions), matching the RotomAttributes.cpp cleanup.

Sweep the no-type anonymous namespaces in the files touched this session and convert to the LLVM-preferred form (static for free functions/variables, reserving anonymous namespaces for class declarations): - LayoutAssignment.cpp: the two attr-name constexprs (already internal). - LayoutAlignment.cpp: atomizeSlotRegion -> static. - RotomTensorOpLowering.cpp: static alias refs + setMaterializedAttr; usings move to file scope. Left as-is: LayoutAssignment.cpp / SeedLayout.cpp anonymous namespaces wrap the pass struct (the LLVM-recommended use), and the foundational/test files.

The tiled layout [[0:2:2],[1:2:2],[0:2:1],[1:2:1]] describes a 4x4 tensor (logical dims 0 and 1 each split into a high/low digit, extent 4), packed into 2 ciphertexts of 8 slots. The test wrongly applied it to tensor<2x2x2x2>, whose size-2 dims made the layout look incompatible and get silently dropped. With the correct tensor<4x4>, the layout flows through assignment and materialization (arg -> tensor<2x16>) and the add stays a plain ciphertext add -- still not Rotom-lowered, since the tiled layout has non-unit strides.

The bazel-<worktree> convenience symlink (bazel-rotom-rolls-phase1) was accidentally committed in 8bf3927 because .gitignore only listed the main workspace's symlinks. Untrack it and generalize the ignore to /bazel-*.

…atchers 93b5624 added an isCompatible guard and deduped it into relationsCompatibleAndEqual across the named-layout matchers (isRelationSquatDiagonal/RowMajor/PerRow/Bicyclic) and the collapse/expand-shape checks. That was motivated by the Rotom diagonal path, which has since been removed, so the guard is vestigial. Restore origin/main's direct relation.isEqual(...) and drop the helper. 26 convert-to-ciphertext lit tests plus the Rotom/Layout suites pass.

Refactor phase A of the LayoutAssignment split. Each of the 12 per-op candidate visitors (visitFunc/visitElementwise/visitTranspose/... ) becomes a free static generate*(AssignmentContext&, XOp) function; visitOperation dispatches to them. The pass implements the new narrow AssignmentContext interface -- the seed/candidate/cost API the generators call back into -- so each generator depends only on that, not on the pass struct. This is the decoupling that lets the generators move to per-category files next. Also tighten the file-local anonymous namespace to wrap only the LayoutAssignment struct; the free helpers and generators are now plain static functions per the LLVM convention. Behavior-preserving: builds clean, 22/22 Rotom lit + execution tests pass.

Move getPlainValueType / isTensorLike / isLayoutCompatibleWithValue out of LayoutAssignment.cpp into a small ValueUtils.{h,cpp}. These are the shared value-type predicates the per-op generators rely on; giving them a header lets the generators move to their own files (refactor phase C) without depending on the pass translation unit. Behavior-preserving: 22/22 Rotom tests pass.

Refactor phase C. The 12 generate* functions now live in four op-family files under LayoutAssignment/gen/ -- Structural (func/secret.generic/yield/ passthrough), Elementwise (add/sub/mul + linalg.generic), ReduceTranspose, and Reshape (collapse/expand/extract/insert) -- all declared in Generators.h. The linalg-body predicates (isElementwiseGeneric/hasAddLikeBody) move with the elementwise generator. LayoutAssignment.cpp drops from ~680 to ~420 lines: it now holds the pass struct, the AssignmentContext implementation, the op dispatch, and the selection phase. Adding a new tensor op is now: a generator in the relevant gen/ file + a Generators.h declaration + one dispatch line. Behavior-preserving: builds clean, 22/22 Rotom lit + execution tests pass.

edwjchen added 2 commits May 21, 2026 13:06

Rotom layout alignment, assignment, and lowering

89946e7

update

a6ef15e

AlexanderViand marked this pull request as draft May 21, 2026 16:52

asraa self-requested a review May 21, 2026 16:52

edwjchen added 26 commits June 11, 2026 21:23

Mask remap outputs in Rotom lowering, interpret tensor_ext.remap, add…

42b39b4

… execution tests

Merge remote-tracking branch 'upstream/main' into rotom_layout

617f6da

Update Rotom tests to new dim/layout attribute syntax from upstream

80f647f

Rotom: return CtPrefix struct from inferCtPrefixLen

dc9d27a

Replace the out-parameter with a small struct { size_t length; int64_t straddleSlotExtent; } so the don't-care caller no longer declares a throwaway local. Behavior unchanged; build + 28 tests pass.

Rotom: document preprocessLayoutData

7cb6d1b

Rotom: tighten preprocessLayoutData docstring

235e79b

Rotom: trim roll equal-extent comment

b3fe1e5

Rotom: drop two stale inline comments in RotomAttributes

7694ceb

edwjchen added 30 commits June 22, 2026 17:08

Rotom: use mlir::debugString for candidate tie-key stringification

f57d686

Replace the hand-rolled raw_string_ostream wrappers with MLIR's debugString helper: drop layoutKey entirely (candidateTieKey calls debugString directly) and reduce kernelKey to a one-liner. No behavior change.

Rotom: tight per-function docstrings on the DimMaps dim-map helpers

6c79f1d

Replace the shared group comment over getReduction/Collapse/Expand/Extract/ InsertSliceDimMap with a one-line docstring per function naming the op it maps and how it relabels dims.

Rotom: drop the verbose _comment from cost_model.json

6525f28

Rotom: remove cost_model.json; the RotomCostModel struct is the sourc…

004afe6

…e of truth The JSON was no longer loaded (only documented the defaults), so drop it and tighten the struct comment.

Rotom: drop redundant anonymous namespace in DimMaps.cpp

43e6fbd

Its single file-local helper is now static (LLVM-preferred form for free functions), matching the RotomAttributes.cpp cleanup.

Rotom: stop tracking the per-worktree bazel symlink

4cca55a

The bazel-<worktree> convenience symlink (bazel-rotom-rolls-phase1) was accidentally committed in 8bf3927 because .gitignore only listed the main workspace's symlinks. Untrack it and generalize the ignore to /bazel-*.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wip: Rotom layout alignment, assignment, and lowering#2980

wip: Rotom layout alignment, assignment, and lowering#2980
edwjchen wants to merge 62 commits into
google:mainfrom
edwjchen:rotom_layout

edwjchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edwjchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant