Skip to content

rocprim ops crash on gfx1201 (RDNA4) at large data size (>500K elements) #776

@ptj0225

Description

@ptj0225

System Info

  • GPU: AMD Radeon AI PRO R9700 (gfx1201, wavefront=32)
  • ROCm: 7.2.3
  • PyTorch: 2.9.1+rocm6.3
  • OS: Ubuntu 24.04

Issues Found

1. Compile Error: texture_cache_iterator.hpp - memset not available on AMD platform

/opt/rocm-7.2.3/include/rocprim/iterator/texture_cache_iterator.hpp:178:

memset(&resourse_desc, 0, sizeof(hipResourceDesc));

memset in amd_detail/amd_device_functions.h is __device__ only. The bind_texture method runs on host, so memset is not found. Adding #include <cstring> before rocprim.hpp works around it, but the texture object creation (hipCreateChannelDesc) is only in nvidia_detail/ and not available on AMD. The texture_cache_iterator should be guarded with #ifdef __HIP_PLATFORM_NVIDIA__.

2. Runtime Crash: Memory Access Fault at large data sizes

rocprim operations (exclusive_scan, radix_sort_pairs, select, run_length_encode) cause heap corruption → delayed crash in hipFree when processing >500K elements on gfx1201.

Reproducer (works standalone at N=800K but crashes when chained in a larger pipeline):

import torch
from cumesh._C import CuMesh
V=800000; F=int(V*1.5)
v=torch.randn(V,3,device="cuda")
f=torch.randint(0,V,(F,3),device="cuda",dtype=torch.int32)
m=CuMesh(); m.init(v,f)
m.get_edges()
m.get_vertex_face_adjacency()
m.get_edge_face_adjacency()  # ← Memory access fault

Root cause hypothesis: gfx1201 has wavefront size 32 (not 64 like previous generations). rocprim internal temp storage calculations and/or thread block layouts may assume wavefront=64, leading to buffer overflow in multi-stage pipelines.

Note: Same code works correctly on NVIDIA RTX 3060 and on gfx1201 at smaller scales (<500K elements).

The standalone rocprim operations (exclusive_scan, radix_sort_pairs, select) at N=800K do NOT crash. The crash only occurs when multiple operations are chained in sequence (as in a real-world DAG processing pipeline like mesh simplification/connectivity analysis).

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions