Skip to content

[cDAC] Copy all debuggee project-reference DLLs/PDBs into Helix symbols dir#128221

Merged
max-charlamb merged 7 commits into
dotnet:mainfrom
max-charlamb:fix/cdac-xplat-copy-project-ref-symbols
May 20, 2026
Merged

[cDAC] Copy all debuggee project-reference DLLs/PDBs into Helix symbols dir#128221
max-charlamb merged 7 commits into
dotnet:mainfrom
max-charlamb:fix/cdac-xplat-copy-project-ref-symbols

Conversation

@max-charlamb

@max-charlamb max-charlamb commented May 14, 2026

Copy link
Copy Markdown
Member

Note

Drafted with Copilot CLI.

Problem

The Helix symbol-prep step in cdac-dump-helix.proj copies only <DebuggeeName>/<DebuggeeName>.dll from each debuggee's publish output into the dumps/local/symbols/debuggees/<DebuggeeName>/ directory that ClrMD uses as a symbol search path. DLLs contributed by ProjectReference (e.g. InterpreterStack's Trampoline.dll, which lives next to InterpreterStack.dll in the publish folder) are silently dropped.

When the cDAC opens an x-plat Windows dump and ClrMD has to reload that module's metadata from disk (Windows mini-dump heap mode does not capture read-only image pages), the load fails with VirtualReadException in EcmaMetadata_1.GetMetadataProvider:

StackWalk_VerifyInterleavedStackLayout(config: local/jit (windows_x64)) [FAIL]
  VirtualReadException : Failed to read 2308 bytes at 0x7ff99f581610.
    at ContractDescriptorTarget.ReadBuffer
    at EcmaMetadata_1.GetMetadataProvider(ModuleHandle handle)
    at EcmaMetadata_1.GetMetadata(ModuleHandle handle)

This manifested as 3 failing InterpreterStackDumpTests on every host running the windows-x64 and windows-arm64 dumps in the CdacXPlatDumpTests stage of runtime-diagnostics. Linux dumps were unaffected because Linux core dumps include file-backed mappings, so ClrMD never had to fall back to disk.

Fix

Broaden the per-debuggee copy step from <DebuggeeName>.dll to *.dll so project-reference outputs (e.g. Trampoline.dll) are also captured. cDAC and ClrMD read ECMA-335 metadata directly from the PE, so only DLLs need to be copied — no PDBs are required.

Also add /Y to the Windows copy command to keep it non-interactive on a retry that reuses the same payload directory.

The publish output is framework-dependent (--no-self-contained) so only user DLLs are present; the shared framework lives in the correlation payload and is not affected by this change.

Testing

Validation requires a manual runtime-diagnostics xplat pipeline run — local repro requires Helix infrastructure.

Build 1421241 was an earlier validation run; a fresh run will be triggered on the latest commit before merging.

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the cDAC DumpTests Helix work item command script to stage all debuggee-produced managed binaries (and best-effort symbols) into the dumps/local/symbols/debuggees/<DebuggeeName>/ symbol search path, rather than copying only the primary <DebuggeeName>.dll.

Changes:

  • Windows: copy *.dll from each debuggee publish folder into the Helix symbols directory.
  • Windows: attempt to copy *.pdb (best-effort) from each debuggee publish folder into the Helix symbols directory.
  • Unix: copy *.dll and best-effort *.pdb from each debuggee publish folder into the Helix symbols directory.

Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
@max-charlamb max-charlamb marked this pull request as ready for review May 15, 2026 03:31
Copilot AI review requested due to automatic review settings May 15, 2026 03:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
@github-actions github-actions Bot mentioned this pull request May 17, 2026
Max Charlamb added 3 commits May 18, 2026 13:31
…ls dir

The Helix symbol-prep step in cdac-dump-helix.proj copies only
`%(DebuggeeName)\%(DebuggeeName).dll` from each debuggee's publish output
into the dumps/local/symbols/debuggees/<DebuggeeName>/ directory that
ClrMD uses as a symbol search path. Any DLLs contributed by ProjectReferences
(e.g. InterpreterStack's Trampoline.dll, which lives next to
InterpreterStack.dll in the publish folder) are silently dropped, so when the
cDAC opens an x-plat Windows dump and ClrMD has to reload that module's
metadata from disk (Windows mini-dump heap mode does not capture read-only
image pages), the load fails with VirtualReadException in
EcmaMetadata_1.GetMetadataProvider.

This manifested as 3 failing InterpreterStackDumpTests on every host running
the windows-x64 and windows-arm64 dumps in the CdacXPlatDumpTests stage —
linux dumps were unaffected because Linux core dumps include file-backed
mappings, so ClrMD never had to fall back to disk.

Switch the per-debuggee copy to `*.dll` (and best-effort `*.pdb`)
from the debuggee publish folder so project-reference outputs are also
captured. The publish output is framework-dependent (--no-self-contained)
so only user DLLs/PDBs are present; the shared framework lives in the
correlation payload and is not affected.
…issing PDBs

Windows: replace `copy ... 2>nul || (exit /b 0)` (which can terminate the
enclosing Helix batch script early) with `if exist ... copy ...` — a true
no-op when no PDB matches the wildcard.

Unix: replace `cp ... 2>/dev/null || true` and the previous attempt at a
`for ...; do ...; done` (which would have been split by the Arcade Helix
SDK SplitCommands on the bare `;`) with `find -maxdepth 1 -name *.pdb
-exec cp -t <dst> {} +` — naturally a no-op when find matches nothing, and
contains no semicolons.
- Windows: add `/Y` to both copy commands so a retried Helix work item that
  reuses the same payload directory doesn't sit at an overwrite prompt.
- Unix: quote the find `-name` pattern (`'*.pdb'`) so the shell doesn't
  pre-expand it before find runs, and replace GNU-only `cp -t <dest> {} +`
  with the portable form `cp {} <dest>/ +` so this also works on
  `TargetOS=osx` where `cp` is BSD and doesn't support `-t`.
The earlier `find ... -exec cp {} <dst>/ +` form failed in the
`runtime-diagnostics` pipeline on every Unix platform (linux-arm,
linux-arm64, linux-x64, osx-arm64, osx-x64) with a dash parser error:

    /root/helix/work/correlation/scripts/.../execute.sh: 95:
        Syntax error: word unexpected (expecting ")")

The Helix containers use `/bin/sh` -> dash, which doesn't tolerate the
shape of the `find` line as the Helix SDK ends up embedding it into
`execute.sh`.

Switch to the simpler `cp <src>/*.pdb <dst>/ 2>/dev/null || true`,
which is what the surrounding debuggee-invocation line already does
(see line ~123). The shell expands the glob; if no PDB matches, `cp`
prints to stderr (silenced) and exits non-zero (ignored). `|| true` is
safe in dash and does not affect the surrounding script's control flow.

This keeps the macOS / BSD portability the earlier review asked for
(no `cp -t`) without the shell-syntax issue.
Copilot AI review requested due to automatic review settings May 19, 2026 03:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
@max-charlamb max-charlamb requested review from noahfalk and rcj1 May 19, 2026 18:28
@rcj1

rcj1 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Why do we need the PDBs?

@max-charlamb

Copy link
Copy Markdown
Member Author

Why do we need the PDBs?

The interpreter test uses another DLL as a trampoline to get a stack that has multiple interpreter frames. This causes failures because the secondary DLLs pdb is not copied over. See the description for more info.

@rcj1

rcj1 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Why do we need the PDBs?

The interpreter test uses another DLL as a trampoline to get a stack that has multiple interpreter frames. This causes failures because the secondary DLLs pdb is not copied over. See the description for more info.

What are you trying to read from the PDB (as opposed to the DLL)?

The cdac and ClrMD read ECMA-335 metadata directly from the PE
(`.dll`) — there is no managed source code / line-number assertion in
these dump tests that would require `.pdb` files. The original code in
PR dotnet#126385 (which set up this `symbols/` directory) intentionally only
copied DLLs for the same reason.

The actual bug this PR fixes is that `%(Identity)\%(Identity).dll` only
picks up the primary DLL and silently drops any project-reference output
that ends up in the same publish folder (e.g.
`InterpreterStack/Trampoline.dll`). When the cdac later asked ClrMD to
read metadata for `Trampoline.dll` from a Windows dump it couldn't find
the file and threw `VirtualReadException`. Switching the glob to
`*.dll` fixes that.

The PDB-copy scaffolding I added in earlier commits has been the source
of every CI failure on this PR (`|| (exit /b 0)` on Windows, `cp -t`
portability, dash shell parsing) and is not needed. Drop it.
Copilot AI review requested due to automatic review settings May 19, 2026 20:16
@max-charlamb

Copy link
Copy Markdown
Member Author

Why do we need the PDBs?

The interpreter test uses another DLL as a trampoline to get a stack that has multiple interpreter frames. This causes failures because the secondary DLLs pdb is not copied over. See the description for more info.

What are you trying to read from the PDB (as opposed to the DLL)?

Ahh that is correct. We should be using the DLL only. Previously we weren't sending the other DLL. I'll narrow the scope of this.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
Comment thread src/native/managed/cdac/tests/DumpTests/cdac-dump-helix.proj Outdated
@max-charlamb max-charlamb merged commit 47f0519 into dotnet:main May 20, 2026
63 checks passed
@max-charlamb max-charlamb deleted the fix/cdac-xplat-copy-project-ref-symbols branch May 20, 2026 16:57
@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants