macOS 27 beta: MPSGraph scratch-heap overflow on Gemma 4 12B full-attention (head_dim 512) layers + macOS AOT .aimodelc load regression (AIModelError 3)

**Two related macOS 27.0 beta Core AI runtime bugs that block 12B-class Gemma 4 on Mac GPU.** A clean one-layer bisection isolates the trigger. Numerics are verified-correct and the graph runs at smaller sizes — both blockers are runtime / MPSGraph-side, not model-side. (Related to #5, another macOS-27-beta MPSGraph lowering bug.)

### Environment
- macOS **27.0** build **26A5353q** (beta), Apple **M4 Max** (`applegpu_g16s`)
- `coreai-build` **3600.67.5.8.1** (MetalToolchain-v27.1.5194)
- coreai-models pipelined engine (`llm-runner` / `llm-benchmark`), `COREAI_CHUNK_THRESHOLD=1`
- Model: a **Gemma 4 12B** dense decode-only pipelined bundle (in-graph embed+head, one growing KV pair, dual head_dim 256 sliding / 512 full, `attention_k_eq_v` full layers)

### Bug 1 — MPSGraph scratch-heap overflow on full-attention (head_dim 512) layers
At the **first decode token** the engine aborts:
```
allocateMTLBufferFromMTLHeap: offset 198400 + size 16384 exceeds heap total 212992
.../MPSRuntime/Operations/GPUMemrefOps.mm:687: failed assertion
  'Failed to acquire the source buffer for the ViewOp'
```
**Decisive bisection:**
- `--num-layers 5` (all *sliding*, head_dim 256) → **runs** (~409 tok/s)
- `--num-layers 6` (adds the first *full* layer: head_dim 512, 16 query heads) → **crashes**

The failing buffer is exactly `[1, 16, 1, 512]` fp16 = **16384 B**, the full layer's `q_proj` output. It scales with the number of full layers (16 KB at 1 full layer, 32 KB at 2) and overflows MPSGraph's ~208 KB decode scratch heap (mis-sized by ~2 KB). Sliding-layer Q (`[1,16,1,256]` = 8 KB) fits, and **Gemma 4 E2B/E4B full layers (8 heads × 512 = 8 KB) also fit and run** — only the 12B's 16-head × 512 Q tips the heap over. The crash is **invariant** to every graph-source change tried (KV cache pad↔replicate, uniform narrow, `.contiguous()` on Q and on K/V, vanilla vs HF SDPA): identical heap / offset / size each time.

### Bug 2 — AOT `.aimodelc` fails to load on macOS (regression vs iOS)
Pre-compiling for the **correct M4 Max arch** succeeds:
```
xcrun coreai-build compile <bundle>.aimodel --platform macOS --architecture h16s --expect-frequent-reshapes -o /tmp/aot
```
…but loading the resulting `.aimodelc` fails:
```
CoreAIDelegates.AIModelError error 3      (raw AIModel.load)
invalidCompiledModel                      (llm-runner / LanguageBundle)
```
This is **not** specific to the Bug-1 graph: a model that JIT-runs perfectly (`--num-layers 5`, all sliding) **also fails to AOT-load** with the same `AIModelError 3`. So this macOS build cannot load *any* precompiled `.aimodelc` for a macOS target, while the **same Core AI runtime loads AOT `.aimodelc` fine on iOS** (h18p bundles run on iPhone 17 Pro). With/without `--expect-frequent-reshapes` and with the source `.aimodel` present alongside, same result.

### Impact
Together these block all GPU paths for 12B-class Gemma 4 (and likely any model whose per-layer attention intermediate exceeds the scratch heap) on Mac: JIT crashes (Bug 1), AOT-load is rejected (Bug 2), and the CPU delegate also fails to load. If Bug 2 were fixed, AOT would work around Bug 1 exactly as iOS already does.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS 27 beta: MPSGraph scratch-heap overflow on Gemma 4 12B full-attention (head_dim 512) layers + macOS AOT .aimodelc load regression (AIModelError 3) #27

Environment

Bug 1 — MPSGraph scratch-heap overflow on full-attention (head_dim 512) layers

Bug 2 — AOT `.aimodelc` fails to load on macOS (regression vs iOS)

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

macOS 27 beta: MPSGraph scratch-heap overflow on Gemma 4 12B full-attention (head_dim 512) layers + macOS AOT .aimodelc load regression (AIModelError 3) #27

Description

Environment

Bug 1 — MPSGraph scratch-heap overflow on full-attention (head_dim 512) layers

Bug 2 — AOT .aimodelc fails to load on macOS (regression vs iOS)

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug 2 — AOT `.aimodelc` fails to load on macOS (regression vs iOS)