Skip to content

[Proposal]: Per-Architecture DWP Packaging for Split-DWARF #48

@qxy11

Description

@qxy11

Background

Most production binaries are built with split-DWARF to minimize the size of binaries. During development on a local machine, the debugger can locate these .dwo files relative to the build tree. However, this original build tree is gone in any production or remote debugging scenario and coredump analysis on SEVs. Without a .dwp file, the debugger has no debug info to retrieve variable names and source locations.

DWP packaging makes split-DWARF debug info usable outside the original build environment. Since we currently have no standard for DWP packaging of GPU DWOs and can't include device-side debug info without greatly bloating production binary sizes, GPU debugging in production is non-functional.

Architecture Mismatch

When we compile a HIP source file with split-dwarf for multiple GPU architectures, we get DWO files for distinct architectures. For example, if we run hipcc -gsplit-dwarf -g --offload-arch=gfx942 --offload-arch=gfx950 -c foo.hip:

  • Host compilation (x86_64): produces foo.dwo
  • Device compilation per GPU arch: produces foo_gfx942.dwo, foo_gfx950.dwo, etc.

The resulting .dwo files target different CPU/GPU architectures, and must all be correctly packaged into DWPs for both host and device debugging to work in production scenarios.
For example:

$ hipcc -gsplit-dwarf -g --offload-arch=gfx942 --offload-arch=gfx950 -c saxpy.hip

$ llvm-dwarfdump --debug-info saxpy.dwo | head -5
saxpy.dwo:	file format elf64-x86-64
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0x2ba31cef503e1158

$ llvm-dwarfdump --debug-info saxpy_gfx942.dwo | head -5
saxpy_gfx942.dwo:	file format elf64-amdgpu
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0xad21fe8a2dfeb5bb

$ llvm-dwarfdump --debug-info saxpy_gfx950.dwo | head -5
saxpy_gfx950.dwo:	file format elf64-amdgpu
.debug_info.dwo contents:
0x00000000: Compile Unit: ... unit_type = DW_UT_split_compile, ...
  DWO_id = 0x10ccaff1dd367ead

We need to have some standardized way of packaging these CPU and GPU DWOs.

Options

Option 1: Treat Architectures as Compatible

When we tested on ROCm 6.2.1, 6.4.0, 6.4.2, all produced identical DWO IDs across GPU architectures, causing llvm-dwp to fail with error: duplicate DWO ID (FA993CFF029EEF25). If we use ROCm 7.0, there are no issues with combining all of these DWOs into one file. However, the order of inputs may matter, as llvm-dwp seems to determine its output ELF container architecture from the first input file, even if the DWP contains DWARF CUs from both x86-64 and AMDGCN targets.

# Host DWO first → x86-64 container
$ llvm-dwp -o host_first.dwp saxpy.dwo saxpy_gfx942.dwo saxpy_gfx950.dwo
$ readelf -h host_first.dwp | grep "Machine:"
  Machine: Advanced Micro Devices X86-64

# Device DWO first → AMDGPU container
$ llvm-dwp -o device_first.dwp saxpy_gfx942.dwo saxpy.dwo saxpy_gfx950.dwo
$ readelf -h device_first.dwp | grep "Machine:"
  Machine: AMD GPU

ROCgdb seems to tolerate this architecture mismatch in the DWP load, but still needs to learn to search this DWP for GPU code objects. Right now, it loads the GPU code object as file:///...saxpy#offset=8192&size=9800, and tries to load file:///...saxpy#offset=8192&size=9800.dwp.

Option 2: Separate DWP Per Architecture

We could produce separate DWP files per architecture. This way, the host DWP is elf64-86-64, and each device DWP is elf64-amdgpu. There are no architecture mismatches, but it produces 1 + N DWPs for each GPU architecture. For a binary foo compiled for both gfx942 and gfx950:

  • foo.dwp: host debug info only (x86-64)
  • foo.gfx942.dwp: gfx942 device debug info only (AMDGPU)
  • foo.gfx950.dwp: gfx950 device debug info only (AMDGPU)

This option would require us to have some industry standard naming convention, and we would need to patch GPU debuggers to know to look for <binary>.<arch>.dwp.

Proposal

We'd like to propose adopting Option 2 as standard, and creating a patch so that when debugging a GPU code object for architecture gfxNNNN embedded in a host binary, the debugger searches for <host_binary>.gfxNNNN.dwp alongside the host binary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions