LEGO: Layout Expression Language for Code Generation

LEGO is an algebraic, compiler-agnostic framework for specifying and transforming memory layouts. It provides composable layout primitives that lower through a custom MLIR dialect to generate optimized code for CPU and GPU targets.

[LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping] [CGO 2026 Artifact]

Project Structure

LEGO/
├── python/                  # Python package (lego-layout)
│   ├── lego/
│   │   ├── core.py          # Layout primitives (Row, Col, RegP, GenP, OrderBy, GroupBy, TileByLayout)
│   │   ├── rewriter.py      # DSL-agnostic AST rewriting engine
│   │   ├── python_printer.py# SymPy code printer (base + DSL subclasses)
│   │   ├── rust_printer.py  # Rust code printer
│   │   ├── fortran_printer.py # Fortran code printer
│   │   ├── cxx_printer.py   # C++ code printer
│   │   ├── julia_printer.py # Julia code printer
│   │   ├── cuda_c_printer.py# CUDA C code printer
│   │   ├── js_printer.py    # JavaScript code printer
│   │   ├── glsl_printer.py  # GLSL code printer
│   │   ├── backend/         # MLIR compilation, JIT, SymPy lowering, PyTorch autograd
│   │   └── frontends/       # DSLAdapter ABC + adapters (Triton, cuTile, Numba, JAX, Rust, Fortran, C++, Julia, CUDA C, JS, GLSL, python_mlir)
│   ├── examples/            # Usage examples (triton, numba_cuda, jax, cutile, python_mlir, symbolic, rust, cxx, fortran, julia, cuda_c, js, glsl)
│   │   └── puzzles/         # GPU puzzles — multi-backend kernel tests (CUDA, ROCm, Vulkan, WebGPU, Metal)
│   └── tests/               # Python tests
│
├── include/Lego/           # MLIR dialect headers (ODS definitions, passes)
├── lib/Lego/               # MLIR dialect implementation (lowering, verification, simplification)
├── tools/lego-opt/         # MLIR optimizer CLI
├── test/                   # MLIR lit tests
│
├── viz/                    # LEGO Studio (browser-based visualizer)
│   ├── wasm/               # Emscripten-compiled LEGO compiler (lego_driver.wasm)
│   ├── js/                 # Frontend JavaScript
│   └── css/                # Styles
│
├── paper/                  # Paper benchmarks and evaluation scripts
├── docs/                   # Architecture and dialect documentation
├── scripts/                # Setup scripts
└── CMakeLists.txt          # Build system (monolithic and decoupled modes)

Architecture

All paths flow through the MLIR lego dialect, which normalizes, simplifies, and strength-reduces layout expressions before handing off to target-specific backends. JIT frontends lower through the dialect, extract simplified patterns, then return control to the original framework. Source code generators lower through the dialect, extract SymPy expressions from the optimized arith IR, then emit target-language source. GPU/CPU backends lower the dialect all the way to machine code:

                                  User Code
                                      |
            +-------------------------+-------------------------+
            |                         |                         |
  +---------+---------+  +-----------+----------+  +------------+----------+
  |  JIT Frontends    |  |     GPU / CPU        |  |   Source CodeGen      |
  |  Triton, Numba,   |  |   KernelBuilder,     |  |  Rust, C++, Fortran, |
  |  JAX, cuTile      |  |   Tensor API         |  |  Julia, CUDA C,      |
  +---------+---------+  +-----------+----------+  |  JS, GLSL            |
            |                        |              +------------+----------+
            +------------+-----------+---------------------------+
                         |
            +------------+------------+
            |    lego MLIR dialect    |
            |  ...................    |
            |  normalization          |
            |  lowering               |
            |  simplification         |
            |  strength reduction     |
            |  verification (SMT)     |
            +-----+-------+--------+-+
                  |       |        |
                  v       v        v
  +---------------+  +----+----+  ++----------------+
  | extract       |  | extract |  | compile         |
  | patterns      |  | SymPy   |  | to target       |
  +-------+-------+  +----+----+  +---+-----+-----++
          |                |           |     |     |
  +-------+--------+  +---+------+    |     |     |
  | return to      |  | code     |    |     |     |
  | original       |  | printers |    |     |     |
  | framework      |  +---+------+    |     |     |
  |                |      |           |     |     |
  | Triton PTX,    |  +---+------+    |     |     |
  | Numba CUDA,    |  | target   |    |     |     |
  | JAX XLA,       |  | source   |    |     |     |
  | cuTile         |  | code     |    |     |     |
  +----------------+  +----------+    |     |     |
                                      |     |     |
    +------------------+------------------+------------------+---------------------+---------------------+
    |                  |                  |                  |                     |                     |
+---+------------+ +---+------------+ +---+------------+ +---+---------------+ +---+---------------+   |
| lego-to-llvm   | | lego-to-nvvm   | | lego-to-rocdl  | | lego-to-xevm    | | lego-to-spirv     |   |
+---+------------+ +---+------------+ +---+------------+ +---+---------------+ +---+---------------+   |
    |                  |                  |                  |                     |                     |
+---+------------+ +---+------------+ +---+------------+ +---+---------------+ +---+---------------+   |
|   CPU          | |   CUDA         | |   AMD          | |   Intel GPU      | |   SPIR-V          |   |
|   X86, ARM     | |   PTX/cubin    | |   HSACO        | |   binary         | |   (Vulkan)        |   |
+----------------+ +----------------+ +----------------+ +------------------+ +---+---+---+---+---+   |
                                                                                  |   |   |   |       |
                   +------------------+                              +--------+   |   |   |   |       |
                   | lego-to-llvmspirv |                             |  naga  +---+   |   |   |       |
                   +---+--------------+                              +--------+       |   |   |       |
                       |                                             +--------+       |   |   |       |
                   +---+--------------+                              |  naga  +-------+   |   |       |
                   |   LLVM SPIR-V    |                              +--------+           |   |       |
                   |   (OpenCL)       |                              +--------+           |   |       |
                   +------------------+                              |  naga  +-----------+   |       |
                                                                     +--------+               |       |
                                                                     +--------+               |       |
                                                                     |  naga  +---------------+       |
                                                                     +---+----+                       |
                                                                         |                            |
                                                            +------+  +--+---+  +-------+  +---------++
                                                            | WGSL |  | MSL  |  | GLSL  |  |  WebGL   |
                                                            +------+  +------+  +-------+  +----------+

Frontends

Frontend	Module	Decorator	Description
Triton	`lego.frontends.triton_jit`	`@lego.jit`	Transforms Triton GPU kernels via AST rewriting; supports `block_ptr` (TMA) code generation (vecadd, matmul, vecadd block_ptr, matmul block_ptr)
cuTile	`lego.frontends.cutile_jit`	`@lego.jit`	Transforms `cuda.tile` (cuTile) kernels via AST rewriting (vecadd, matmul)
Numba CUDA	`lego.frontends.numba_jit`	`@lego_jit`	Transforms Numba CUDA kernels, scalar thread indexing (vecadd, matmul)
JAX	`lego.frontends.jax_jit`	`@lego_jit`	Transforms JAX functions, preserves `static_argnums` (vecadd, matmul)
Tensor API	`lego.frontends.python_mlir`	--	JIT-compiled layout transforms for NumPy/PyTorch with `torch.compile` support (example)
Rust	`lego.frontends.rust_gen`	`lego.rust_gen.generate()`	Generates Rust source code (example)
Fortran	`lego.frontends.fortran_gen`	`lego.fortran_gen.generate()`	Generates Fortran source code (example)
C++	`lego.frontends.cxx_gen`	`lego.cxx_gen.generate()`	Generates C++ source code (example)
Julia	`lego.frontends.julia_gen`	`lego.julia_gen.generate()`	Generates Julia source code (example)
CUDA C	`lego.frontends.cuda_c_gen`	`lego.cuda_c_gen.generate()`	Generates CUDA C kernel source code (example)
JavaScript	`lego.frontends.js_gen`	`lego.js_gen.generate()`	Generates JavaScript source for WebGPU/WASM (example)
GLSL	`lego.frontends.glsl_gen`	`lego.glsl_gen.generate()`	Generates GLSL shader source code (example)
Symbolic	`lego.core`	--	SymPy-based algebraic layout expressions (example)

Each JIT frontend implements the DSLAdapter interface (frontends/_adapter.py), which defines four hooks: unwrap, find_runtime_vars, get_code_printer, and compile_and_wrap. The DSL-agnostic rewriter (rewriter.py) handles AST transformation and symbolic evaluation. The Triton adapter additionally supports block_ptr (TMA) code generation, emitting tl.make_block_ptr / tl.advance calls with automatic boundary checks.

Source Code Generation Backends

Seven source-code generation backends take a Python function with LEGO layout expressions and emit equivalent index arithmetic in the target language. Each leverages SymPy's built-in code printers:

import lego
from lego.core import OrderBy, Row

def index_kernel(M, N, BM, BN):
    L = OrderBy(Row(M, N)).TileBy((M // BM, N // BN), (BM, BN))
    offset = L[pid_m, pid_n, :, :]
    return offset

rust_src    = lego.rust_gen.generate(index_kernel)
cxx_src     = lego.cxx_gen.generate(index_kernel)
fortran_src = lego.fortran_gen.generate(index_kernel)
julia_src   = lego.julia_gen.generate(index_kernel)
cuda_src    = lego.cuda_c_gen.generate(index_kernel)
js_src      = lego.js_gen.generate(index_kernel)
glsl_src    = lego.glsl_gen.generate(index_kernel)

Key differences by language:

Feature	Rust	C++	Fortran	Julia	CUDA C	JavaScript	GLSL
Range	`(0..N)`	`std::views::iota(0, N)`	`(/ (i, i=0, N-1) /)`	`(0:N-1)`	comment	`Array.from(...)`	comment
Floor div	`a / b`	`a / b`	`a / b`	`div(a, b)`	`a / b`	`Math.floor(a/b)`	`a / b`
Modulo	`a % b`	`a % b`	`mod(a, b)`	`mod(a, b)`	`a % b`	`a % b`	`a % b`
Power	`.powi(n)`	`std::pow(a, n)`	`a**n`	`a^n`	`pow(a, n)`	`Math.pow(a, n)`	`pow(a, n)`
Sqrt	`(x as f64).sqrt()`	`std::sqrt(x)`	`sqrt(dble(x))`	`sqrt(x)`	`sqrt(x)`	`Math.sqrt(x)`	`sqrt(x)`

Tensor API

The Tensor API provides layout constructors and transforms for NumPy and PyTorch:

from lego import Tiled, ColMajor, ZCurve, Swizzle, BlockCyclic, Batched

# Basic layouts
layout = Tiled((8, 8), tile_shape=(4, 4))
result = layout.transform(tensor)          # or layout(tensor)
back = layout.inverse_transform(result)

# GPU-oriented layouts
z = ZCurve((4, 4))          # Morton curve for 2D spatial locality
s = Swizzle((8, 8))         # XOR swizzle to avoid shared memory bank conflicts
bc = BlockCyclic((16,), 2, 2)  # ScaLAPACK-style distribution

# Batched transforms (vectorized, no Python loop)
batched = Batched(layout, batch_shape=(32,))
batched.transform(batch_tensor)  # (32, 8, 8)

# Composition and comparison
composed = layout_a.compose(layout_b)
assert RowMajor((4, 4)) == RowMajor((4, 4))

PyTorch integration compiles layout transforms to native PyTorch arithmetic via the MLIR lowering pipeline. Instead of materializing O(numel) permutation tables, layout index expressions are lowered through MLIR (lego-lower pass with simplification and strength reduction), extracted as SymPy expressions, and compiled to vectorized PyTorch functions. For example, Col(4,8) becomes 4*j + i -- pure arithmetic, no lookup table.

import lego
import torch

layout = lego.ColMajor((4, 8))
x = torch.randn(4, 8)

# Transform: uses compiled arithmetic (arange + mul + add + gather)
physical = layout.transform(x)       # autograd-compatible
logical = layout.inverse_transform(physical)  # round-trips exactly

# LegoTensor: layout-aware tensor subclass
lx = lego.as_lego_tensor(x, layout)
result = lx + lx          # operates on physical storage, no permutation
back = result.to_logical() # converts back to row-major

# torch.compile: traces through compiled index arithmetic
@torch.compile(backend="lego")
def fn(t):
    return layout.transform(t) * 2

LegoTensor is a torch.Tensor subclass that carries layout metadata. Elementwise ops between same-layout tensors operate directly on physical storage. LegoArray provides the same for NumPy.

MLIR Backend

The lego MLIR dialect defines layout operations (gen_p, reg_p, row, col, order_by, group_by, tile_by, apply, apply_inverse) with types !lego.layout and !lego.view<T>. Layouts may contain symbolic (SymPy) dimensions, which are lowered to MLIR function parameters and resolved to concrete values at invocation time. The dialect includes passes for:

Normalization -- desugar row/col/tile_by to primitive reg_p/order_by/group_by
Lowering -- lego ops to arith/scf/affine
Simplification -- optimize divui/remui patterns, distributive factoring (muli(a,c) + muli(b,c) → muli(addi(a,b), c))
Strength Reduction -- convert power-of-2 muli/divui/remui to shift/mask operations
Verification -- unified lego.check op for bijectivity, GPU bank conflicts, and memory coalescing (SMT-backed via Z3)

Seven lowering pipelines target different backends:

Pipeline	Target	Output	Shared memory
`lego-to-llvm`	CPU	LLVM IR (X86, AArch64)	N/A
`lego-to-nvvm`	CUDA	PTX/cubin via NVPTX	Yes
`lego-to-rocdl`	AMD	HSACO via AMDGPU	Yes
`lego-to-xevm`	Intel GPU	LLVM SPIR-V + XeVM binary	Yes
`lego-to-spirv`	Vulkan/WebGPU/Metal/WebGL	SPIR-V binary; naga converts to WGSL/MSL/GLSL	Yes (workgroup)
`lego-to-llvmspirv`	SPIR-V (OpenCL)	LLVM dialect with SPIR-V calling conventions	Yes
WASM (Emscripten)	Browser	`lego_driver.wasm` — full compiler in the browser	N/A

The NVVM, ROCDL, XeVM, and LLVM SPIR-V backends share the same three-phase architecture:

buildLegoGPUOutlinePipeline -- LEGO lower + GPU kernel outlining
Backend-specific GPU-to-LLVM conversion (GPUToNVVM / GPUToROCDL / GPUToLLVMSPV / XeVM)
buildGPUHostLLVMPipeline -- host-side LLVM lowering

Example: compile for sm_80 with max optimization:

lego-to-nvvm{chip=sm_80 opt-level=3}

Requirements

Dependency	Version	Notes
Python	>= 3.12	Tested with 3.12, 3.13, 3.14
LLVM/MLIR	commit `7477045`	Included as a submodule
CMake	>= 3.20
Ninja		Recommended build generator
NumPy	2.1.2
SymPy	1.14.0

Optional:

Dependency	Used by
PyTorch	Tensor API, `torch.compile`
Triton	Triton JIT frontend
cuda.tile	cuTile JIT frontend
Numba	Numba CUDA frontend
JAX	JAX JIT frontend
wgpu	Vulkan/WebGPU execution verification
naga-cli	SPIR-V to WGSL/MSL conversion (`cargo install naga-cli`)

Installation

Quick install (Python package only)

pip install lego-layout

This installs the core layout algebra and frontends. The MLIR dialect native extensions are included in the wheel when available.

Platform support:

Platform	Wheel tag	GPU backends included
Linux x86_64	`manylinux_2_28` (glibc 2.28+: RHEL 8+, Ubuntu 20.04+, Debian 11+)	CUDA (PTX), ROCm, Intel (XeVM), Vulkan, WebGPU, Metal, LLVM SPIR-V
macOS ARM64	`macosx_15_0_arm64`	CUDA (PTX), Intel (XeVM), Vulkan, WebGPU, Metal, LLVM SPIR-V

All GPU backends are cross-compilers — no GPU hardware required at install time. The naga binary is bundled for SPIR-V to WGSL/MSL/GLSL conversion.

Development install

1. Clone and set up the environment

git clone https://github.com/tavakkoliamirmohammad/lego.git
cd lego
./scripts/setup.sh
source venv/bin/activate
pip install -e ./python

2. Build the MLIR dialect

Monolithic build (builds LLVM/MLIR + LEGO together):

cmake -S . -B build -DLEGO_MONOLITHIC_LLVM=ON
cmake --build build -j$(nproc) --target check-lego

Decoupled build (uses a prebuilt MLIR for fast iteration):

cmake -S . -B build -DMLIR_DIR=<mlir_build>/lib/cmake/mlir -DLEGO_MONOLITHIC_LLVM=OFF
cmake --build build -j$(nproc) --target check-lego

The build system automatically detects and uses fast linkers (mold/lld) and ccache.

To customize LLVM targets (default X86;NVPTX;AMDGPU;SPIRV):

cmake -S . -B build -DLEGO_MONOLITHIC_LLVM=ON -DLEGO_LLVM_TARGETS="X86;NVPTX;AMDGPU;SPIRV;AArch64"

GPU Runners

GPU execution tests are controlled by per-backend flags, auto-detected from hardware:

Flag	Hardware	What it enables
`LEGO_ENABLE_CUDA_RUNNER`	NVIDIA GPU (`nvidia-smi`)	CUDA kernel execution via `mlir-runner`
`LEGO_ENABLE_ROCM_RUNNER`	AMD GPU (`rocm-smi`)	ROCm kernel execution
`LEGO_ENABLE_METAL_RUNNER`	macOS Metal GPU	Metal/Vulkan/WebGPU execution via `wgpu`

If any runner is enabled, SPIR-V execution tests (Vulkan, WebGPU, Metal) also run since SPIR-V works on any GPU backend. To explicitly enable a runner:

cmake -S . -B build -DLEGO_ENABLE_METAL_RUNNER=ON   # macOS Metal
cmake -S . -B build -DLEGO_ENABLE_CUDA_RUNNER=ON     # NVIDIA CUDA
cmake -S . -B build -DLEGO_ENABLE_ROCM_RUNNER=ON     # AMD ROCm

Testing

# MLIR lit tests
cmake --build build --target check-lego

# Python tests
cmake --build build --target check-lego-python

# Compile-only puzzle tests (no GPU required — tests all 7 backends)
cmake --build build --target check-lego-puzzles-compile

# GPU puzzle tests (requires at least one runner enabled)
cmake --build build --target check-lego-puzzles

# All tests
cmake --build build --target check-lego-all

Citation

If you use LEGO in your research, please cite:

Amir Mohammad Tavakkoli, Cosmin E. Oancea, and Mary Hall. "LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping." In 2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 228-241, 2026.

@INPROCEEDINGS{tavakkoli2026lego,
  author={Tavakkoli, Amir Mohammad and Oancea, Cosmin E. and Hall, Mary},
  booktitle={2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)},
  title={LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping},
  year={2026},
  pages={228-241},
  keywords={Codes;Algebra;Shape;Instruction sets;Layout;Graphics processing units;Organizations;Optimization;Indexing;Python;data layout;MLIR compiler;domain-specific optimization tools},
  doi={10.1109/CGO68049.2026.11394846}}

The paper artifact is available at: https://zenodo.org/records/17633994

License

MIT License. See LICENCE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
docs		docs
include/Lego		include/Lego
lib/Lego		lib/Lego
paper		paper
python		python
scripts		scripts
test		test
third_party		third_party
tools/lego-opt		tools/lego-opt
viz		viz
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CMakeLists_lego.txt		CMakeLists_lego.txt
LICENCE.md		LICENCE.md
README.md		README.md
build_wasm.sh		build_wasm.sh
ci-local.sh		ci-local.sh
requirements.txt		requirements.txt
verify_bounds.py		verify_bounds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEGO: Layout Expression Language for Code Generation

Project Structure

Architecture

Frontends

Source Code Generation Backends

Tensor API

MLIR Backend

Requirements

Installation

Quick install (Python package only)

Development install

1. Clone and set up the environment

2. Build the MLIR dialect

GPU Runners

Testing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LEGO: Layout Expression Language for Code Generation

Project Structure

Architecture

Frontends

Source Code Generation Backends

Tensor API

MLIR Backend

Requirements

Installation

Quick install (Python package only)

Development install

1. Clone and set up the environment

2. Build the MLIR dialect

GPU Runners

Testing

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages