gdb/testsuite: support optimized code in gdb.rocm testsuite#103
gdb/testsuite: support optimized code in gdb.rocm testsuite#103spatrang wants to merge 8 commits into
Conversation
|
Hi @lancesix This PR is a re-submission of the optimized-code series that was originally
The corresponding fix-up commit is on top of this branch. Please take cc: @lumachad |
|
pre-commit seems to be grumpy. |
e05abcd to
12346d2
Compare
|
As stated elsewhere, the series only handles GDB_TESTCASE_OPTIONS. If someone passes CFLAGS directly to the compiler the optimization level checks won't work properly. Do we want to fix that as well? Maybe as a follow up if not in this one? |
|
There's a pre-commit failure. Other than that, I'm still seeing FAIL's for gdb.rocm/watchpoint-basic.exp when running the testsuite with -O1 or -O3. It passes when running with -O0. Could you please check that you're running the testsuite correctly? |
7c21341 to
a912e0e
Compare
|
Did the latest force-pushes provide anything that would fix gdb.rocm/watchpoint-basic.exp? I can't tell. |
7320b08 to
c729dec
Compare
|
For -lto runs, it looks like gdb.rocm/hip-builtin-variables.exp and gdb.rocm/hip-lang-detect.exp are still failing. |
Both hip-builtin-variables.exp and hip-lang-detect.exp now require no_lto. LTO can re-inline across TUs (defeating noinline/optnone boundaries) and rewrite the per-CU DW_LANG attribute, so the test premises don't hold at -flto. Matches the existing require no_lto pattern in watchpoint-at-end-of-shader.exp and mi-aspace.exp. |
c729dec to
65a4a86
Compare
65a4a86 to
24d7933
Compare
24d7933 to
d415a28
Compare
|
Things look clean from -O1 through -O3. I do see a FAIL with -flto though: FAIL: gdb.rocm/hip-builtin-completions.exp: language detection (dwarfdump execution) Since we claim to be clean on -flto runs as well, we should fix that or skip it. |
d415a28 to
ea1067d
Compare
|
Confirmed a clean run for -flto as well. Thanks. |
ea1067d to
42d183e
Compare
There was a problem hiding this comment.
Since we're doing a new pass. Please cleanup the following:
Inconsistent spacing around attribute across the PR.
- gdb/testsuite/gdb.rocm/hip-builtin-variables.cpp:37,45,52,71 — attribute((optnone))
- gdb/testsuite/gdb.rocm/lane-info.cpp:38,49,53 — attribute((noinline)), attribute((noinline, disable_tail_calls))
- gdb/testsuite/gdb.rocm/line-breakpoint-in-kernel.cpp:34 — attribute((optnone))
- gdb/testsuite/gdb.rocm/multi-inferior-fork.cpp:39 — attribute((optnone))
- gdb/testsuite/gdb.rocm/multi-inferior-gpu.cpp:35 — attribute((optnone))
- gdb/testsuite/gdb.rocm/names.cpp:127 — attribute((optnone))
- gdb/testsuite/gdb.rocm/ocp_mx.cpp — five occurrences of attribute((optnone))
- gdb/testsuite/gdb.rocm/scheduler-locking.cpp:36 — attribute((optnone))
- gdb/testsuite/gdb.rocm/static-global.cpp:40 — attribute((optnone))
- gdb/testsuite/gdb.rocm/step-schedlock-spurious-waves.cpp:27 — attribute((optnone))
- gdb/testsuite/gdb.rocm/unaligned-memory-access.cpp:52,57 — attribute((noinline)), attribute((optnone))
- gdb/testsuite/gdb.rocm/watch-gpu-global-from-host.cpp:27 — attribute((optnone))
- gdb/testsuite/gdb.rocm/finish.cpp:225 — new attribute((optnone)) (still missing space, only on the kernel declaration)
- Fix: normalise to attribute ((...)) everywhere added by the patch.
Typos / Comments
- Spelling: inconsistent British/American mix introduced by the PR. New comments use optimiser, optimisation, initialisation, optimised in some files (alu-exceptions.cpp, fork-exec-.cpp, precise-memory-.c, deref-scoped-pointer.exp, hip-builtin-variables.exp,
watchpoint-basic.exp) and optimization/optimized in others (the new lib/rocm.exp predicates, static-global.cpp, several require no_optimized_code skips). Existing gdb tree leans American (optimize); pick one for the whole PR. - gdb/testsuite/gdb.rocm/static-global.cpp:37–39 — the new comment explains why optnone is used but doesn't explain the unrelated change on the same line from static to non-static. The companion file deref-scoped-pointer.cpp:35–37 does call out "Non-static so a line
breakpoint…". Add the same rationale here for consistency.
Bugs
- gdb/testsuite/gdb.rocm/watch-gpu-global-from-host.cpp:34 — silently downgrades a C++ cast to a C-style cast:
- int *devGlobal;
- if (hipGetSymbolAddress (reinterpret_cast<void **> (&devGlobal), global))
+ volatile int *devGlobal;
+ if (hipGetSymbolAddress ((void **) &devGlobal, global))
- The cast change is presumably because reinterpret_cast<void**> won't strip the new volatile qualifier — but the C-style cast hides exactly that fact. This is the kind of cast that the GNU/ROCgdb style explicitly discourages in C++ code. Better: keep an explicit const_cast/reinterpret_cast pair so the volatile strip is visible, or store devGlobal as non-volatile and only mark the access path volatile.
- gdb/testsuite/gdb.rocm/lane-info.cpp:51–55 — the trailing asm volatile ("" : : : "memory"); in bar() is unreachable because sleep_forever() is an endless loop (volatile bool keep_going = true; while (keep_going) …). The author's stated rationale ("volatile asm prevents the call from being constant-folded away at the call site") refers to foo(), where it makes sense. In bar() the disable_tail_calls attribute is the load-bearing piece; the asm is dead. Either drop the asm in bar() and rely on disable_tail_calls, or annotate sleep_forever as attribute ((noreturn)) and remove the dead asm.
- gdb/testsuite/lib/rocm.exp check_for_any_flags (lines ~165–200) — uses lsearch -exact $src $flag, which relies on Tcl's string→list coercion of $src. If a future caller puts a flag with {} or unbalanced quotes into GDB_TESTCASE_OPTIONS/CFLAGS_FOR_TARGET, the implicit list conversion throws. Defensive form: if {[lsearch -exact [split $src] $flag] != -1} — split with no separators yields whitespace-tokenised words and avoids the auto-list coercion. Same logic, harder to break later.
Introduce the necessary helper functions into the gdb.rocm
framework so that support for optimized code can be implemented.
- check_for_any_flags: detects whether GDB_TESTCASE_OPTIONS,
CFLAGS_FOR_TARGET or CXXFLAGS_FOR_TARGET contains any of the
given flags. Designed for whitespace-free flags such as
-O2, -flto.
- gdb_caching_proc no_optimized_code: predicate for -O1/-O2/-O3.
- gdb_caching_proc no_lto: predicate for -flto.
Co-Authored-By: Andrey Kasaurov <andrey.kasaurov@amd.com>
Co-Authored-By: Luis Machado <luis.machado@amd.com>
Change-Id: I750d64e842413429e281bb8f7d85c4f46319a627
Signed-off-by: Sarang Patrange <spatrang@amd.com>
Adapt several gdb.rocm tests so they pass under -O1/-O2/-O3:
- Apply __attribute__((optnone)) to device functions whose
locals or call structure the tests rely on, and to host
main() where its store-then-exit pattern would otherwise
be tail-optimised.
- Make breakpoint-anchor helper functions ("done", "foo",
"bar") non-static and noinline, with an asm volatile body,
so the symbol survives the optimiser and the breakpoint
can be inserted.
- Mark local variables volatile where the test reads them
via convenience variables or takes their address.
- Restore "__global__ void" qualifier on device-attach
kernel, and mark its spin-loop guard volatile so the loop
survives optimisation.
- Relax regex/line-number expectations under optimisation
in deep-stack, hip-builtin-variables, snapshot-objfile-on-
load, and step-schedlock-spurious-waves.
- Brace expr-substitutions in deep-stack and until-tests
(tclint hygiene).
Co-Authored-By: Luis Machado <luis.machado@amd.com>
Change-Id: I750d64e842413429e281bb8f7d85c4f46319a627
Signed-off-by: Sarang Patrange <spatrang@amd.com>
Match the optional "0xADDR in " prefix explicitly: GDB prints it at -O0 (and for non-innermost frames) but omits it at -O1/-O2/-O3. The old "#N .* NAME .*" patterns also depended on frame-number column alignment for frames #10-#12. Make the address prefix optional for all frames #0-#12 so the backtrace matches at every optimization level. Signed-off-by: Sarang Patrange <spatrang@amd.com>
For tests whose semantics cannot be reconciled with optimised
code, gate them so they UNSUPPORTED-out cleanly instead of
failing.
- "require no_optimized_code" at the top of: aspace-user-
input, aspace-watchpoint, convenience_variables, device-
barrier, deref-scoped-pointer, finish, fork-exec-non-gpu-
to-gpu, lane-execution, mi-aspace, mi-lanes, nonstop-
displaced, precise-memory-warning-watchpoint, register-
watchpoint, runtime-core, shared-memory.
- "require no_lto" at the top of: hip-builtin-completions,
hip-builtin-variables, hip-lang-detect, mi-aspace,
watchpoint-at-end-of-shader.
- watchpoint-basic.exp: proc-level skip of
test_non_write_watchpoint_before_runtime_load's rwatch/
awatch sub-cases under optimisation. The HIP runtime is
initialised before main() reaches "Break before runtime
load", so amd-dbgapi is already attached and rejects the
host-side read/access watchpoint at insert time. Only
this sub-test depends on the "before runtime load"
premise; the other procs still work under optimisation,
so use a proc-level check_for_any_flags skip rather than
a file-level "require no_optimized_code".
Co-Authored-By: Andrey Kasaurov <andrey.kasaurov@amd.com>
Co-Authored-By: Luis Machado <luis.machado@amd.com>
Signed-off-by: Sarang Patrange <spatrang@amd.com>
83529fb to
9157ed9
Compare
|
Sent offline comments. |
- lib/rocm.exp: document whole-token matching in
check_for_any_flags; reword the no_optimized_code and
no_lto headers to describe what the predicates do.
- lane-info.cpp: restore static on sleep_forever() (only
foo() and bar() need to be externally visible).
- finish.{cpp,exp}: rename returnMixedSmallSruct ->
returnMixedSmallStruct; drop static for consistency with
the other return-type helpers in this file.
- until-tests.exp: anchor the line-number alternation with
a non-digit so e.g. "Line 43" no longer accepts "Line 430".
- hip-builtin-variables.cpp: restore const on the locals in
kern() (optnone on the kernel keeps them observable).
- hip-builtin-variables.exp: ASCII '-' as dash -> '--'.
- deref-scoped-pointer.cpp: clarify the comment on done()
(a stable pending-breakpoint target, not anything about
addresses at -O0).
- deep-stack.exp: capitalize "Return" in the comment.
- ocp_mx.cpp: hoist the optnone rationale to a real
file-level comment instead of leaving it dangling above
fp8e4m3_values.
Signed-off-by: Sarang Patrange <spatrang@amd.com>
lumachad
left a comment
There was a problem hiding this comment.
Just one nit this time:
- gdb/testsuite/gdb.rocm/until-tests.exp:76-79 — the test was: until lands on exactly breakpoint_loc+1. After this patch it accepts +1 or +2 unconditionally, even at -O0. That weakens the assertion for the non-optimized path it used to cover. Gate the loose pattern on
[check_for_any_flags {-O1 -O2 -O3}] and keep the strict +1 check for the unoptimized build.
I'd give Lancelot a chance to go through it again.
Restore the strict "breakpoint_loc + 1" assertion on the unoptimized
path that the previous loose pattern accidentally weakened. The
"+1 or +2" alternation is only needed when the optimizer may merge
or skip statements, so gate it on check_for_any_flags {-O1 -O2 -O3}.
Signed-off-by: Sarang Patrange <spatrang@amd.com>
Thanks! Pushed 86c2e23: gated the (+1|+2) alternation on check_for_any_flags {-O1 -O2 -O3}, restored the strict +1 check at -O0. until-tests.exp 28/28 PASS at -O0 and -O2. |
The per-TU __hip_module_ctor / __hip_fatbin_wrapper hash does not survive -flto -fgpu-rdc, leaving the wrapper symbols unresolved. Skip the test under -flto, matching the other LTO-incompatible gdb.rocm tests. Signed-off-by: Sarang Patrange <spatrang@amd.com>
lancesix
left a comment
There was a problem hiding this comment.
Note that this time I only looked at the overall diff, but having individual commits is still important, I'll check this on next round.
- deep-stack.{cpp,exp}: move "break here" marker; remove
optimization-conditional line offset.
- deref-scoped-pointer.cpp: restore static on done().
- finish.cpp: restore static on return-type helpers; drop kernel
optnone.
- lane-info.cpp: drop noreturn on sleep_forever (conflicted with
volatile loop guard).
- multi-inferior-fork.cpp: align child_after_fork() with the
canonical noinline + asm volatile anchor.
- ocp_mx.{cpp,exp}: gate on require no_optimized_code; drop
optnone decorations.
Suggested-by: Lancelot Six <lancelot.six@amd.com>
Signed-off-by: Sarang Patrange <spatrang@amd.com>
c0fae00 to
3aba2e8
Compare
aktemur
left a comment
There was a problem hiding this comment.
I'm having some difficulty following the changes introduced by patches. Some patches should've been fixups -- they modify previous changes introduced by patches in the same PR. Can we please split, re-order, re-organize the patches so that the changes are focused?
| proc check_for_any_flags {flags} { | ||
| global GDB_TESTCASE_OPTIONS CFLAGS_FOR_TARGET CXXFLAGS_FOR_TARGET | ||
|
|
||
| if {![llength $flags]} { |
There was a problem hiding this comment.
Let's please compare against 0 instead of treating like a boolean.
| return 1 | ||
| } | ||
|
|
||
| # Check whether any of the given flags appear in GDB_TESTCASE_OPTIONS, |
There was a problem hiding this comment.
Nit: It would be helpful to add a label to the commit title, like "gdb/testsuite: update gdb.rocm....".
| require no_optimized_code | ||
| require no_lto |
There was a problem hiding this comment.
We can combine these two into a single line.
| # so "next" lands on the brace instead. Either is fine for | ||
| # this test - the point is just to step past the local | ||
| # initialisation before reading blockIdx. | ||
| # initialization before reading blockIdx. |
There was a problem hiding this comment.
Change unrelated to this patch.
| load_lib rocm.exp | ||
|
|
||
| require allow_hipcc_tests | ||
| require no_optimized_code |
There was a problem hiding this comment.
Nit: Add a label to the commit title, like "gdb/testsuite: skip unsupported gdb.rocm tests...."
|
|
||
| /* Non-static so a line breakpoint at the "done ()" call site below | ||
| has a stable address at -O0. */ | ||
| /* Non-static marker function used by the test as a stable pending |
There was a problem hiding this comment.
This hunk should've been a fixup of the patch that originally introduced this line.
| __device__ static __attribute__ ((noinline)) MixedSmallStruct | ||
| returnMixedSmallSruct () | ||
| __device__ __attribute__ ((noinline)) MixedSmallStruct | ||
| returnMixedSmallStruct () |
There was a problem hiding this comment.
This typo fix and the removal of "static" are separate changes. Typo fix is obvious. But "static" is from a line that was already touched by a previous patch. I think it's better to split the patches and do focused things. It's ok to have minor typo fixes in one patch but having other changes in such a patch blurs the purpose.
| int thread_idx_x = threadIdx.x; | ||
| int thread_idx_y = threadIdx.y; | ||
| int thread_idx_z = threadIdx.z; | ||
| const int thread_idx_x = threadIdx.x; | ||
| const int thread_idx_y = threadIdx.y; | ||
| const int thread_idx_z = threadIdx.z; |
There was a problem hiding this comment.
This is another example of changing an already-changed portion. Can we please split the patch and apply hunks at the right patches as fixups?
| # without whitespace, like -flto, -O2, etc. Matching is whole-token | ||
| # only: e.g. "-O2" matches "-O2" but not "-O2=foo" or "-O20". |
There was a problem hiding this comment.
Again, this hunk should belong to another parent patch.
| # Executables built with optimization may stop at a different line here. | ||
| set pattern "([expr {$breakpoint_loc + 1}]|[expr {$breakpoint_loc + 2}])" | ||
| # Anchor with a non-digit so e.g. "Line 43" doesn't accept "Line 430". | ||
| if {[check_for_any_flags {-O1 -O2 -O3}]} { | ||
| set pattern "([expr {$breakpoint_loc + 1}]|[expr {$breakpoint_loc + 2}])" | ||
| set test_name "info line at breakpoint_loc + 1 or 2 lines" | ||
| } else { | ||
| set pattern "[expr {$breakpoint_loc + 1}]" | ||
| set test_name "info line at breakpoint_loc + 1" | ||
| } | ||
| # Anchor the matched line number with a non-digit so a two-digit | ||
| # number does not accidentally match a longer one (e.g. line 43 | ||
| # matching line 430). |
There was a problem hiding this comment.
This should've been a fixup on "Update some gdb.rocm tests to support optimized code".
lancesix
left a comment
There was a problem hiding this comment.
Hi,
I did not really re-read evertything, but it seems you push commits to this branch to fix previous commits of the branch still under review (gdb/testsuite/gdb.rocm: drop redundant test decorations for example).
We do not squash commits in GDB, but rather craft series and refine patches in the series as needed. The bits you change in that last patch have to be integrated into previous patches.
Summary
This series makes the
gdb.rocmtestsuite pass when the test programs arecompiled with optimization (
-O1/-O2/-O3,-flto) passed viaGDB_TESTCASE_OPTIONS, rather than the implicit-O0the suite assumes today.Three commits, split by concern:
Update
gdb.rocmframework to support optimized code — adds helpers ingdb/testsuite/lib/rocm.exp:check_for_any_flags {flags}— detects whetherGDB_TESTCASE_OPTIONScontains any of the listed flags.
gdb_caching_proc no_optimized_code— predicate for-O1/-O2/-O3.gdb_caching_proc no_lto— predicate for-flto.Update some
gdb.rocmtests to support optimized code (22 files) —two patterns applied surgically:
.cpp/.c: addvolatileon locals the test inspects, and__attribute__((optnone))on helper functions the test breakpoints into.Add a trivial
volatile intstatement to empty "break here" functionsso their body is not elided.
.exp: relax line-number arithmetic, backtrace PC formatting,expected source strings after
step, and add a bounded retry loop forextra ticks under optimization.
Skip unsupported tests in case of optimized code (19 files) — for tests
whose semantics cannot be reconciled with optimized code, add
require no_optimized_codeat the top;mi-aspace.expalso getsrequire no_lto.register-watchpoint.expsoft-skips the tail viacheck_for_any_flagsafter the earlier portion has run.Diffstat: 42 files changed, +146 / -46.
Context
This continues the work started in internal PR
AMD-ROCm-Internal/ROCgdb#6,
rebased onto the current
amd-stagingof the public repo. The commits arecontent-identical to the internal PR, preserving the original authors
(Aleksandar Rikalo, Andrey Kasaurov) with their
Co-Authored-By:trailersand adding my
Signed-off-by.