Add dejagnu tests for cooperative group GWS debugging by spatrang · Pull Request #116 · ROCm/ROCgdb

spatrang · 2026-05-07T13:08:08Z

Summary

Add dejagnu coverage for debugging AMD GPU cooperative-group kernels —
i.e. kernels launched via hipLaunchCooperativeKernel /
hipLaunchCooperativeKernelMultiDevice that synchronize at the grid /
multi-grid level. On AMD GPUs these synchronization primitives are
implemented in hardware via Global Wave Sync (GWS), and they have a
distinct wave/scheduling model that has historically only been covered by
out-of-tree tests. This PR brings that coverage into the dejagnu testsuite
so it runs as part of the regular ROCgdb regression suite.

Tests added

File	Scenario
`gdb.rocm/coop-group-grid-sync.{cpp,exp}`	Single-device cooperative kernel using `cooperative_groups::this_grid().sync()` (intra-device GWS), launched via `hipLaunchCooperativeKernel`.
`gdb.rocm/coop-group-multi-grid-sync.{cpp,exp}`	Multi-device cooperative kernel using both `this_grid().sync()` and `cooperative_groups::this_multi_grid().sync()` (intra + cross-device GWS), launched via `hipLaunchCooperativeKernelMultiDevice`.

What gets verified

coop-group-grid-sync.exp — two sub-tests:

test_break_around_grid_sync
- Hit a breakpoint before grid.sync() inside a cooperative dispatch.
- Confirm multiple AMDGPU Wave threads are stopped (waves participating
  in the GWS barrier).
- Confirm info dispatches lists the cooperative dispatch.
- Move the breakpoint to after grid.sync() and continue: it must
  fire (proves GWS-protected code can be debugged across the barrier).
- Continue to clean program exit.
test_threads_in_coop_kernel
- For every AMDGPU Wave parked inside the kernel, switch to it and
  confirm bt 1 reports a frame inside coop_grid_sync_kernel.

coop-group-multi-grid-sync.exp — runs in non-stop mode:

After continue -a &, confirm a kernel-side breakpoint fires inside
coop_multi_grid_sync_kernel. Per-GPU child breakpoint instances
(Breakpoint X.Y) are observed for every participating GPU.
Continue all threads to program exit, which only succeeds if both
this_grid().sync() and this_multi_grid().sync() release correctly
under the debugger.

The host-side post-conditions in the .cpp programs additionally validate
the cooperative semantics numerically (cross-workgroup data dependency for
the single-device case, cross-device sum aggregation for the multi-device
case), so any regression in GWS behavior under the debugger turns into a
test failure rather than a silent miscompare.

Skip / unsupported handling

The tests degrade cleanly on systems that cannot run them:

Single-device test: queries cooperativeLaunch; if unsupported the
program prints a recognizable message and exits, and the .exp marks
the test UNSUPPORTED.
Multi-device test: requires >= 2 GPUs and
cooperativeMultiDeviceLaunch on every device. It is also gated by
the existing hip_devices_support_debug_multi_process requirement.
Any of those missing → UNSUPPORTED.

No new dejagnu helpers are required; both .exp files use existing
infrastructure in lib/rocm.exp.

Out of scope / follow-ups

Intentionally left out of this PR; happy to extend if reviewers ask:

Stepping (next / step / stepi) across grid.sync() /
mgrid.sync() boundaries.
Conditional breakpoints inside cooperative kernels.
lane apply / per-lane register inspection while waves are parked at
the GWS barrier.
Watchpoints on cooperative shared buffers.

Copilot

Pull request overview

Adds new ROCm dejagnu coverage to exercise ROCgdb debugging of cooperative-group HIP kernels that synchronize via GWS, covering both single-device this_grid().sync() and multi-device this_multi_grid().sync() scenarios.

Changes:

Introduces a single-device cooperative-kernel test that breaks before/after grid.sync() and validates waves/dispatch visibility.
Introduces a multi-device cooperative-kernel non-stop test that breaks inside a multi-grid kernel and runs through grid + multi-grid barriers to completion.
Adds two HIP C++ test programs that implement the cooperative-group synchronization patterns and validate results on the host side.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
gdb/testsuite/gdb.rocm/coop-group-grid-sync.exp	DejaGnu test for single-device cooperative kernel debugging around `this_grid().sync()`.
gdb/testsuite/gdb.rocm/coop-group-grid-sync.cpp	HIP program implementing single-device cooperative `grid.sync()` and host-side validation.
gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp	DejaGnu non-stop test for multi-device cooperative kernel debugging through `this_grid().sync()` + `this_multi_grid().sync()`.
gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.cpp	HIP program implementing multi-device cooperative launch with cross-device aggregation and validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lancesix

Hi,
Thanks a lot for this, this is a great starting point.

My main concern for now is gfx110x. We do not support debugging cooperative group on those (documented limitation), the testcase should look for them to not FAIL. This is known that the test will not pass even if the arch do support GWS.

I have added a couple of small comments, I'll get back to a more detailed review after the gfx11 concern has been addressed.

spatrang · 2026-05-11T06:39:45Z

Hi, Thanks a lot for this, this is a great starting point.

My main concern for now is gfx110x. We do not support debugging cooperative group on those (documented limitation), the testcase should look for them to not FAIL. This is known that the test will not pass even if the arch do support GWS.

I have added a couple of small comments, I'll get back to a more detailed review after the gfx11 concern has been addressed.

Addressed. Added a supports_cooperative_groups helper in lib/rocm.exp that excludes gfx1100/1101/1102/1103, and both .exp files now require it, so on gfx110x the run reports UNSUPPORTED: …: require failed: supports_cooperative_groups instead of FAIL. Mirrors the existing hip_devices_support_debug_multi_process pattern in the same lib.

Add dejagnu coverage for debugging AMD GPU cooperative-group kernels (hipLaunchCooperativeKernel / hipLaunchCooperativeKernelMultiDevice), which synchronize at the grid / multi-grid level via Global Wave Sync (GWS). Previously covered only by out-of-tree tests. New tests: * gdb.rocm/coop-group-grid-sync.{cpp,exp} Single-device, this_grid ().sync (). * gdb.rocm/coop-group-multi-grid-sync.{cpp,exp} Multi-device, this_grid ().sync () + this_multi_grid ().sync (); runs in non-stop mode. Host-side post-conditions validate the cooperative semantics numerically, so any regression in GWS behaviour under the debugger surfaces as a test failure rather than a silent miscompare. The tests pick a debugger-supported device at runtime and self-skip with UNSUPPORTED when the configuration is insufficient. Two helpers added in gdb/testsuite/lib/rocm.exp: target_supports_cooperative_groups <target> (per-target gate, returns false on gfx1100/1101/1102/1103 per amd-dbgapi.h) and supports_cooperative_groups (require-gate wrapper used by both .exp files). This is a debugger-side gate, distinct from the runtime's cooperativeLaunch / cooperativeMultiDeviceLaunch flags.

aktemur · 2026-06-08T07:37:14Z

+	# debugged across the barrier.
+	delete_breakpoints
+	gdb_breakpoint \
+	    [gdb_get_line_number "after-sync line"] allow-pending


We shouldn't need allow-pending anymore. We can optionally use temporary so that we can remove delete_breakpoints below. We can use temporary for the first breakpoint, too.

Thanks — I applied the temporary part: both breakpoints are now temporary, which let me drop the redundant delete_breakpoints calls (and the same in test_threads_in_coop_kernel).

On allow-pending though - I tried removing it, but it turns out it's still needed here: these breakpoints are on lines inside the kernel (device code), which isn't loaded yet when we set them at main. Without allow-pending, gdb_breakpoint fails outright with "set breakpoint at NN" (gdb declines the unresolved location and defaults to "no" on the pending prompt). On a GPU run this turned into a hard FAIL. So I've kept allow-pending and combined it with temporary (gdb_breakpoint allow-pending temporary). The host-side marker breakpoint in the multi-device test, by contrast, resolves immediately, so temporary alone is fine there.

aktemur · 2026-06-08T07:40:37Z

+	# Verify that waves from multiple workgroups are stopped at the
+	# pre-sync breakpoint.  Counting waves alone is wave-size
+	# dependent (1 wave per workgroup on wave64 vs 2 on wave32) and
+	# would let the test pass on wave32 even if all visible waves
+	# happened to come from a single workgroup.  Instead, collect
+	# the distinct workgroup (block) coordinates from the AMDGPU
+	# Wave entries in "info threads" and require at least two
+	# distinct workgroups, which directly verifies that the
+	# cooperative dispatch's multi-workgroup property is exercised.


Because we stop here before sync, is there a guarantee that we would see 2 distinct workgroups? Wouldn't we have that guarantee rather after synch'ing?

Good question. The guarantee here comes from the cooperative launch itself rather than from the barrier: hipLaunchCooperativeKernel requires the entire grid to be co-resident on the device for the lifetime of the dispatch (that's precisely what makes grid.sync() safe — a non-co-resident grid could deadlock at the barrier). So all workgroups are resident from dispatch start, including before the first grid.sync(). With just 2 workgroups of 64 threads on the target there's ample occupancy, so both are present. I kept the check pre-sync deliberately: verifying the debugger can see all co-resident waves before the barrier (parked at arbitrary points in the kernel) is a more representative debugging scenario than inspecting them lined up at the sync point. Happy to also add an after-sync check if you'd like the stricter guarantee asserted explicitly.

aktemur · 2026-06-08T08:01:30Z

+		set eligible 1
+		pass $gdb_test_name
+	    }
+	    -re "\\\[Inferior 1 \[^\r\n\]* exited normally\\\]\[^\r\n\]*\r\n$::gdb_prompt " {


Can we use -wrap here, too? It's non-stop mode but only the main thread is supposed to hit the breakpoint. So, I expect we are able to use -wrap and simplify the case to -re "\\\[Inferior 1 \[^\r\n\]* exited normally.*". Please also consider using inferior_exited_re.

Good suggestion — but this arm went away entirely with the restructure: the marker is now placed on a line reached on every run, so there is no early [Inferior 1 ... exited normally] case to match anymore. No -wrap/inferior_exited_re arm needed here as a result.

aktemur · 2026-06-08T08:03:47Z

+	    return
+	}
+
+	set n_gpus [get_integer_valueof "n_gpus" 0]


We can do this check early by putting a breakpoint at line 123 and get rid of the "advance to n-gpus-final" check above.

Done — went with this. Moved the n-gpus-final marker onto the if (n_gpus < N_USED_GPUS) line in the .cpp, which runs on every execution before the inferior's own skip-return. The .exp now does a single gdb_continue_to_breakpoint there, reads n_gpus, and reports unsupported if it's < 2. The dual-arm gdb_test_multiple and its [^\r\n]*\r\n[^\r\n]*\r\n pattern are gone. Validated on gfx942 (in-tree build + 7.14 nightly rocgdb).

aktemur · 2026-06-08T08:10:41Z

+	# In non-stop mode, hipLaunchCooperativeKernelMultiDevice
+	# produces one child breakpoint instance per participating GPU
+	# ("Breakpoint <id>.<inst>").  Collect distinct <inst> values
+	# until we have observed a stop on every GPU; only then is it
+	# safe to delete the breakpoint and let the dispatch run
+	# through both grid syncs.


What debugger behavior do we exactly test here? We could put a breakpoint after the sync and all participating blocks/grids would be there. It seems like we are rather testing the runtime, not the debugger.

Good question — you're right that "do all grids reach the kernel" is a runtime property. The debugger behavior I'm after here is gdb's side: under a single hipLaunchCooperativeKernelMultiDevice dispatch, one source breakpoint resolves to multiple device-side locations, reported as a parent breakpoint with a child instance per GPU (Breakpoint .), and in non-stop mode each device-side stop is observed independently. The loop just confirms gdb reports a stop for every per-device location; the "did every grid arrive" part is left to the host-side result check in the .cpp. I've reworded the in-file comment to make this clearer — happy to switch to the simpler "one breakpoint after the sync" approach if you'd prefer.

aktemur · 2026-06-08T08:13:22Z

+		gdb_test "bt 1" \
+		    "#0\[^\r\n\]*coop_grid_sync_kernel\[^\r\n\]*" \
+		    "backtrace inside coop_grid_sync_kernel"


What debugger behavior are we testing here? Before the sync point, stopping waves would be inside the kernel. There is no other kernel. I'm not sure I understand the value of this test from the debugger perspective.

Thanks - that's a fair point, the bt 1 check was close to tautological with a single kernel. I've adopted approach #1 + #2 to make the debugger value explicit: instead of just confirming each wave's backtrace names the kernel, test_threads_in_coop_kernel now

switches to each co-resident wave and reads blockIdx.x, asserting we observe more than one distinct workgroup - i.e. gdb selects the correct per-wave register context; and

within one wave, switches between lanes and asserts threadIdx.x differs across lanes - i.e. gdb reports correct per-lane SIMT state. Both are exercised specifically in the co-resident / GWS-barrier context, which is the cooperative-group angle the existing lane/builtin tests don't cover.

Refine the cooperative-group GWS tests for robustness and to make the debugger behaviour under test more explicit: * coop-group-grid-sync.exp: use temporary breakpoints for the in-kernel locations (still pending, since the device code is loaded at dispatch time) and drop the redundant delete_breakpoints calls. * coop-group-grid-sync.exp: report UNSUPPORTED instead of FAIL when the inferior self-skips because no device advertises cooperativeLaunch. * coop-group-grid-sync.exp: have test_threads_in_coop_kernel check distinct per-wave blockIdx.x (per-wave register context) and per-lane threadIdx.x divergence (per-lane SIMT state), instead of only confirming that the backtrace names the kernel. * coop-group-multi-grid-sync.{cpp,exp}: read n_gpus from a marker line that is reached on every execution (no early return before it), so that when fewer than two cooperative-capable GPUs are available -- including when a parallel test run restricts the visible GPUs -- the test reports UNSUPPORTED rather than FAILing. * coop-group-{grid,multi-grid}-sync.{cpp,exp}: minor comment and GNU-style cleanups -- tab-align the in-kernel marker comment, keep hipLaunchCooperativeKernelMultiDevice on a single line, and clarify the Phase 2 data-dependency comment. * coop-group-{grid,multi-grid}-sync.{cpp,exp}: give the per-wave and per-lane value reads explicit, unique test names, and keep the "n-gpus-final" marker string unique so gdb_get_line_number resolves the intended line. Tested on gfx942 with an in-tree build and the 7.14 nightly rocgdb, both with all GPUs visible and with the visible set restricted to one.

spatrang requested review from Copilot and lumachad May 7, 2026 13:08

Copilot started reviewing on behalf of spatrang May 7, 2026 13:10 View session

spatrang marked this pull request as ready for review May 7, 2026 13:14

spatrang requested a review from a team as a code owner May 7, 2026 13:14

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp Outdated

Comment thread gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp Outdated

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from af78fac to 33f1926 Compare May 7, 2026 13:35

lancesix reviewed May 7, 2026

View reviewed changes

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from 33f1926 to 73b1b20 Compare May 11, 2026 06:54

lumachad reviewed May 14, 2026

View reviewed changes

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch 2 times, most recently from ae4f615 to 5aa9e19 Compare May 20, 2026 06:12

lumachad reviewed May 20, 2026

View reviewed changes

Comment thread gdb/testsuite/gdb.rocm/deep-stack.exp Outdated

Comment thread gdb/testsuite/gdb.rocm/deref-scoped-pointer.exp Outdated

Comment thread gdb/testsuite/gdb.rocm/instruction-stepping-commands.exp Outdated

Comment thread gdb/testsuite/gdb.rocm/watchpoint-basic.exp Outdated

aktemur reviewed May 20, 2026

View reviewed changes

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from 5aa9e19 to d541e7d Compare May 21, 2026 10:50

spatrang requested review from aktemur, lancesix and lumachad May 21, 2026 10:59

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from d541e7d to 28f6ae0 Compare May 21, 2026 12:45

aktemur reviewed May 26, 2026

View reviewed changes

Comment thread gdb/testsuite/gdb.rocm/coop-group-grid-sync.cpp Outdated

Comment thread gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp Outdated

aktemur assigned spatrang May 29, 2026

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from 28f6ae0 to 714c354 Compare June 4, 2026 14:31

spatrang requested a review from aktemur June 4, 2026 16:47

spatrang assigned aktemur and unassigned spatrang Jun 4, 2026

aktemur reviewed Jun 8, 2026

View reviewed changes

aktemur assigned spatrang and unassigned aktemur Jun 8, 2026

spatrang assigned aktemur and unassigned spatrang Jun 9, 2026

spatrang requested a review from aktemur June 9, 2026 08:12

spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from d9f90d2 to a00cfb3 Compare June 9, 2026 10:57

Conversation

spatrang commented May 7, 2026

Summary

Tests added

What gets verified

Skip / unsupported handling

Out of scope / follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

lancesix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spatrang commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spatrang Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

spatrang Jun 8, 2026 •

edited

Loading

spatrang Jun 9, 2026 •

edited

Loading