[Proposal]: Per-Architecture DWP Packaging for Split-DWARF

## Background
Most production binaries are built with split-DWARF to minimize the size of binaries. During development on a local machine, the debugger can locate these .dwo files relative to the build tree. However, this original build tree is gone in any production or remote debugging scenario and coredump analysis on SEVs. Without a `.dwp` file, the debugger has no debug info to retrieve variable names and source locations. 

DWP packaging makes split-DWARF debug info usable outside the original build environment. Since we currently have no standard for DWP packaging of GPU DWOs and can't include device-side debug info without greatly bloating production binary sizes, GPU debugging in production is non-functional. 

## Architecture Mismatch
When we compile a HIP source file with split-dwarf for multiple GPU architectures, we get DWO files for distinct architectures. For example, if we run `hipcc -gsplit-dwarf -g --offload-arch=gfx942 --offload-arch=gfx950 -c foo.hip`:
- Host compilation (x86_64): produces `foo.dwo`
- Device compilation per GPU arch: produces `foo_gfx942.dwo`, `foo_gfx950.dwo`, etc.

The resulting .dwo files target different CPU/GPU architectures, and must all be correctly packaged into DWPs for both host and device debugging to work in production scenarios.
For example: 
```
$ hipcc -gsplit-dwarf -g --offload-arch=gfx942 --offload-arch=gfx950 -c saxpy.hip

$ llvm-dwarfdump --debug-info saxpy.dwo | head -5
saxpy.dwo:	file format elf64-x86-64
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0x2ba31cef503e1158

$ llvm-dwarfdump --debug-info saxpy_gfx942.dwo | head -5
saxpy_gfx942.dwo:	file format elf64-amdgpu
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0xad21fe8a2dfeb5bb

$ llvm-dwarfdump --debug-info saxpy_gfx950.dwo | head -5
saxpy_gfx950.dwo:	file format elf64-amdgpu
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0x10ccaff1dd367ead
 ```
We need to have some standardized way of packaging these CPU and GPU DWOs.
## Options
### Option 1: Treat Architectures as Compatible
When we tested on ROCm 6.2.1, 6.4.0, 6.4.2, all produced identical DWO IDs across GPU architectures, causing llvm-dwp to fail with `error: duplicate DWO ID (FA993CFF029EEF25)`. If we use ROCm 7.0, there are no issues with combining all of these DWOs into one file. However, the order of inputs may matter,  as llvm-dwp seems to determine its output ELF container architecture from the first input file, even if the DWP contains DWARF CUs from both x86-64 and AMDGCN targets. 
```
# Host DWO first → x86-64 container
$ llvm-dwp -o host_first.dwp saxpy.dwo saxpy_gfx942.dwo saxpy_gfx950.dwo
$ readelf -h host_first.dwp | grep "Machine:"
  Machine: Advanced Micro Devices X86-64

# Device DWO first → AMDGPU container
$ llvm-dwp -o device_first.dwp saxpy_gfx942.dwo saxpy.dwo saxpy_gfx950.dwo
$ readelf -h device_first.dwp | grep "Machine:"
  Machine: AMD GPU
```
ROCgdb seems to tolerate this architecture mismatch in the DWP load, but still needs to learn to search this DWP for GPU code objects. Right now, it loads the GPU code object as `file:///...saxpy#offset=8192&size=9800`, and tries to load `file:///...saxpy#offset=8192&size=9800.dwp`.
### Option 2: Separate DWP Per Architecture
We could produce separate DWP files per architecture. This way, the host DWP is `elf64-86-64`, and each device DWP is `elf64-amdgpu`. There are no architecture mismatches, but it produces 1 + N DWPs for each GPU architecture. For a binary `foo` compiled for both `gfx942` and `gfx950`:
- `foo.dwp`: host debug info only (x86-64)
- `foo.gfx942.dwp`: gfx942 device debug info only (AMDGPU)
-  `foo.gfx950.dwp`: gfx950 device debug info only (AMDGPU)

This option would require us to have some industry standard naming convention, and we would need to patch GPU debuggers to know to look for `<binary>.<arch>.dwp`. 

## Proposal
We'd like to propose adopting Option 2 as standard, and creating a patch so that when debugging a GPU code object for architecture `gfxNNNN` embedded in a host binary, the debugger searches for `<host_binary>.gfxNNNN.dwp` alongside the host binary. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal]: Per-Architecture DWP Packaging for Split-DWARF #48

Background

Architecture Mismatch

Options

Option 1: Treat Architectures as Compatible

Option 2: Separate DWP Per Architecture

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal]: Per-Architecture DWP Packaging for Split-DWARF #48

Description

Background

Architecture Mismatch

Options

Option 1: Treat Architectures as Compatible

Option 2: Separate DWP Per Architecture

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions