fix: compile error on RDNA4 (gfx1201) ROCm by Kaihui-AMD · Pull Request #815 · turboderp-org/exllamav2

Kaihui-AMD · 2026-05-15T06:54:16Z

Summary — AMD consumer GPUs

ExLlamaV2 fails to compile on RDNA4 GPUs (gfx1201: RX 9070, RX 9060, R9700) with ROCm 7.x.
rms_norm.cu and layer_norm.cu define NUM_WARPS = 1024 / warpSize under USE_ROCM. On HIP, warpSize is a runtime __device__ variable (RDNA4 supports both wave32 and wave64), so NUM_WARPS is not a compile-time constant. This causes:

__shared__ float sums[NUM_WARPS] — variable length array in shared memory
Host functions referencing NUM_THREADS — __device__ variable used in __host__ context
error: variable length array declaration cannot have 'static' storage duration shared float sums[NUM_WARPS];

FAILED: rms_norm.cuda.o /opt/venv/lib/python3.12/site-packages/exllamav2/exllamav2_ext/hip/rms_norm.hip:103:22: error: variable length array declaration cannot have 'static' storage duration shared float sums[NUM_WARPS]; ^ ~~~~~~~~~

FAILED: layer_norm.cuda.o /opt/venv/lib/python3.12/site-packages/exllamav2/exllamav2_ext/hip/layer_norm.hip:80:22: error: variable length array declaration cannot have 'static' storage duration shared float sums[NUM_WARPS]; ^ ~~~~~~~~~

FAILED: rms_norm.cuda.o /workspace/exllamav2/exllamav2/exllamav2_ext/hip/rms_norm.hip:222:18: error: reference to device function 'operator int' in host function

FAILED: layer_norm.cuda.o /workspace/exllamav2/exllamav2/exllamav2_ext/hip/layer_norm.hip:205:18: error: reference to device function 'operator int' in host function

Fix

Add compile-time constants MAX_NUM_WARPS (32) and NUM_THREADS_CONST (1024):

__shared__ arrays use MAX_NUM_WARPS (upper bound: 1024 / 32 = 32 warps)
Host functions use NUM_THREADS_CONST (NUM_WARPS * WARP_SIZE = 1024 for all warp sizes)
CUDA path unchanged — same values defined for both branches

Test environment

GPU: AMD Radeon R9700 Pro (gfx1201, RDNA 4), 32 GB VRAM
ROCm: 7.2.3
PyTorch: 2.9.1+rocm7.2.3
Container: rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1

Test results

Build: pip install --no-build-isolation -e . — success (was failing before)
test_hgemm.py: 24/24 GEMM shapes passed

import torch
from exllamav2.ext import exllamav2_ext as ext_c

# Test RMS norm
dim = 4096
x = torch.randn(1, dim, dtype=torch.half, device='cuda:0')
w = torch.randn(dim, dtype=torch.half, device='cuda:0')
y = torch.empty_like(x)
ext_c.rms_norm(x, w, y, 1e-6)
print(f'rms_norm: input={x.shape}, output={y.shape}, has_nan={torch.isnan(y).any().item()}, has_inf={torch.isinf(y).any().item()}')

# Test Layer norm
x2 = torch.randn(1, dim, dtype=torch.half, device='cuda:0')
w2 = torch.randn(dim, dtype=torch.half, device='cuda:0')
b2 = torch.randn(dim, dtype=torch.half, device='cuda:0')
y2 = torch.empty_like(x2)
ext_c.layer_norm(x2, w2, b2, y2, 1e-6)
print(f'layer_norm: input={x2.shape}, output={y2.shape}, has_nan={torch.isnan(y2).any().item()}, has_inf={torch.isinf(y2).any().item()}')

print('ALL KERNEL TESTS PASSED')
"""output
rms_norm: input=torch.Size([1, 4096]), output=torch.Size([1, 4096]), has_nan=False, has_inf=False
layer_norm: input=torch.Size([1, 4096]), output=torch.Size([1, 4096]), has_nan=False, has_inf=False
ALL KERNEL TESTS PASSED
"""

Signed-off-by: Kaihui-AMD <Kaihui.Tang@amd.com>

Kaihui-AMD · 2026-05-18T03:24:01Z

@turboderp @kingbri1 Would you mind taking a look? This fixes an RDNA4 (gfx1201) compile blocker in rms_norm.cu and layer_norm.cu — two small changes, CUDA path unchanged. Thanks!

fix compile error on RDNA4

1fd2f57

Signed-off-by: Kaihui-AMD <Kaihui.Tang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: compile error on RDNA4 (gfx1201) ROCm#815

fix: compile error on RDNA4 (gfx1201) ROCm#815
Kaihui-AMD wants to merge 1 commit into
turboderp-org:masterfrom
Kaihui-AMD:fix/rdna4-rocm-vla-shared-memory

Kaihui-AMD commented May 15, 2026 •

edited

Loading

Uh oh!

Kaihui-AMD commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kaihui-AMD commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary — AMD consumer GPUs

Fix

Test environment

Test results

Uh oh!

Kaihui-AMD commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kaihui-AMD commented May 15, 2026 •

edited

Loading