Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 8 additions & 46 deletions docs/design/datacontracts/StackWalk.md
Original file line number Diff line number Diff line change
Expand Up @@ -628,62 +628,24 @@ At each frame yielded by `Filter`, the walk determines whether to scan for GC re

See [GCRefMap Format and Resolution](#gcrefmap-format-and-resolution) for the GCRefMap scanning path and [Signature-Based Scanning](#signature-based-scanning) for the signature decoding path.

### Signature-Based Scanning
### Signature-Based Scanning (currently deferred)

When a transition frame's calling convention is not described by a precomputed GCRefMap (`PrestubMethodFrame`, `CallCountingHelperFrame`, and the fallback path for `StubDispatchFrame`/`ExternalMethodFrame`), the GC reference walk classifies caller-stack arguments by decoding the callee's method signature. This corresponds to native `TransitionFrame::PromoteCallerStack` (`src/coreclr/vm/frames.cpp`).
When a transition frame's calling convention is not described by a precomputed GCRefMap (`PrestubMethodFrame`, `CallCountingHelperFrame`, and the fallback path for `StubDispatchFrame`/`ExternalMethodFrame`), the native runtime classifies caller-stack arguments by decoding the callee's method signature (`TransitionFrame::PromoteCallerStack` in `src/coreclr/vm/frames.cpp`).

#### GcSignatureTypeProvider

`GcSignatureTypeProvider` is an `IRuntimeSignatureTypeProvider<GcTypeKind, GcSignatureContext>` that classifies each parameter type into one of:
The cDAC does **not** currently port this scan. `GcScanner.PromoteCallerStack` is a stub that records the frame as deferred and returns without enumerating any refs:

```csharp
internal enum GcTypeKind
private static void PromoteCallerStack(TargetPointer frameAddress, GcScanContext scanContext)
{
None, // Non-GC primitive that fits in a single slot
Ref, // Object reference (TYPE_GC_REF)
Interior, // Managed pointer / byref (TYPE_GC_BYREF)
Other, // Value type that may contain GC refs, or any type larger than a slot
scanContext.RecordDeferredFrame(frameAddress);
}
```

The provider is scoped to the method's containing module (captured at construction) so that `TypeDef` and `TypeRef` tokens can be resolved to a loaded `MethodTable` via the module's `TypeDefToMethodTable` / `TypeRefToMethodTable` lookup tables. The decoder's generic context is a `GcSignatureContext(TypeHandle classContext, MethodDescHandle methodContext)` carrying the method's class and method instantiations.

The provider classifies primitives directly (`String`/`Object` -> `Ref`, `TypedReference` -> `Other`, others -> `None`). For `TypeDef`/`TypeRef` it resolves the loaded `TypeHandle` and classifies via `RuntimeTypeSystem.GetSignatureCorElementType`, treating enums (`IsEnum`) as their underlying primitive (`None`). When the type cannot be resolved (e.g., not yet loaded), classification falls back to the signature's `rawTypeKind` (`ValueType` -> `Other`, otherwise `Ref`). Arrays are `Ref`, byrefs are `Interior`, raw pointers are `None`. Generic parameters (`!T`, `!!T`) are resolved against the `GcSignatureContext` (via `GetInstantiation` / `GetGenericMethodInstantiation`) and classified by their actual instantiation -- matching native `SigTypeContext`-driven `PeekElemTypeNormalized` behavior. `ELEMENT_TYPE_INTERNAL` resolves the `TypeHandle` via `RuntimeTypeSystem.GetSignatureCorElementType` and maps the `CorElementType` to a `GcTypeKind`.

#### PromoteCallerStack Algorithm

1. Read the `MethodDesc` pointer from the `FramedMethodFrame` and obtain a `MethodDescHandle` from `RuntimeTypeSystem`.
2. Resolve the method's `MetadataReader` via `Loader.GetModuleHandleFromModulePtr` and `EcmaMetadata.GetMetadata`. If metadata is unavailable, no caller-stack refs are reported (matches native fallback behavior).
3. Obtain the method's signature blob, matching native `MethodDesc::GetSig`:
- If `RuntimeTypeSystem.IsStoredSigMethodDesc` is true (dynamic, EEImpl, and array method descs), pin the stored signature span and pass a `BlobReader` over it to `RuntimeSignatureDecoder.DecodeMethodSignature`.
- Otherwise, look up the signature via the metadata token (`mdMethodDef`), skipping methods with a nil token (`0x06000000`).
4. Decode the signature with `RuntimeSignatureDecoder<GcTypeKind, GcSignatureContext>` and a `GcSignatureTypeProvider` constructed for the method's module. The `GcSignatureContext` passes the method's class and method instantiations so that `VAR`/`MVAR` placeholders resolve to their actual types. See [Signature contract](./Signature.md) for the decoder.
5. Skip varargs methods (the caller-stack layout is not described by the callee signature alone).
6. Compute the number of reserved register slots in the `TransitionBlock`:

| Reserved Slot | Condition |
|---|---|
| `this` pointer | `MethodSignature.Header.IsInstance` |
| Return buffer | Return type is `GcTypeKind.Other` |
| Generic instantiation arg | `RuntimeTypeSystem.RequiresInstArg(methodDesc)` |
| Async continuation | `RuntimeTypeSystem.IsAsyncMethod(methodDesc)` |
| ARM64 indirect-result register (`x8`) | Target architecture is ARM64 |

7. If `IsInstance`, report the `this` slot at position `0` (or `1` on ARM64 to skip `x8`). The slot is reported as `GC_CALL_INTERIOR` for value-type `this`, otherwise as a normal reference.
8. Walk `MethodSignature.ParameterTypes` starting at slot index = reserved slot count, advancing one slot per parameter:
- `GcTypeKind.Ref` -> report as a reference.
- `GcTypeKind.Interior` -> report with `GC_CALL_INTERIOR`.
- `GcTypeKind.Other` / `GcTypeKind.None` -> not reported (large value types are reported via the GCRefMap path when one is available; otherwise their interior refs are not visible to this scan).

The slot address is computed using the same formula as the GCRefMap path:

```csharp
slotAddress = transitionBlockPtr + FirstGCRefMapSlot + (position * pointerSize);
```
`RecordDeferredFrame` (on `GcScanContext`) appends a sentinel `StackRefData` entry with `Flags = GcScanFlags.CDAC_DEFERRED_FRAME (0x40000000)` and `Source = frameAddress`. The sentinel has no real GC ref payload; downstream consumers (e.g. the cDAC stress harness in `src/coreclr/vm/cdacstress.cpp`) can detect it and treat the missing refs at that frame as expected gaps rather than cDAC bugs. See [tests/StressTests/known-issues.md](../../../src/native/managed/cdac/tests/StressTests/known-issues.md) for the stress framework's handling and the tracking work to re-enable the scan.

#### Limitations vs. Native
The `GcSignatureTypeProvider` class remains in the tree as the scaffolding the eventual port will use; it has no callers while `PromoteCallerStack` is stubbed.

This signature-based scan has known gaps relative to native see [dotnet/runtime#127765](https://github.com/dotnet/runtime/issues/127765) for tracking.
Tracking work to re-enable the scan: it requires porting `ArgIterator` behind an `ICallingConvention` contract. Once that lands, `PromoteCallerStack` will fan out into the signature-decoding algorithm (reserved-slot computation, signature walk, slot reporting) that mirrors the native version. See also [dotnet/runtime#127765](https://github.com/dotnet/runtime/issues/127765).

### GCRefMap Format and Resolution

Expand Down
5 changes: 5 additions & 0 deletions eng/Subsets.props
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@
<SubsetName Include="Tools.ILLinkTests" OnDemand="true" Description="Unit tests for the tools.illink subset." />
<SubsetName Include="Tools.CdacTests" OnDemand="true" Description="Unit tests for the diagnostic data contract reader." />
<SubsetName Include="Tools.CdacDumpTests" OnDemand="true" Description="Dump-based integration tests for the diagnostic data contract reader." />
<SubsetName Include="Tools.CdacStressTests" OnDemand="true" Description="Stress integration tests for the diagnostic data contract reader." />
<SubsetName Include="Tools.ILAsm" OnDemand="true" Description="Build only the managed ilasm tool." />

<!-- Host -->
Expand Down Expand Up @@ -536,6 +537,10 @@
<ProjectToBuild Include="$(SharedNativeRoot)managed\cdac\tests\DumpTests\Microsoft.Diagnostics.DataContractReader.DumpTests.csproj" Test="true" Category="tools"/>
</ItemGroup>

<ItemGroup Condition="$(_subset.Contains('+tools.cdacstresstests+'))">
<ProjectToBuild Include="$(SharedNativeRoot)managed\cdac\tests\StressTests\Microsoft.Diagnostics.DataContractReader.StressTests.csproj" Test="true" Category="tools"/>
</ItemGroup>

<ItemGroup Condition="$(_subset.Contains('+tools.illink+'))">
<ProjectToBuild Include="$(ToolsProjectRoot)illink\src\linker\Mono.Linker.csproj" Category="tools" />
<ProjectToBuild Include="$(ToolsProjectRoot)illink\src\ILLink.Tasks\ILLink.Tasks.csproj" Category="tools" />
Expand Down
56 changes: 56 additions & 0 deletions eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# prepare-cdac-stress-helix-steps.yml - Steps for preparing cDAC stress test Helix payloads.
#
# Used by CdacStressTests stage in runtime-diagnostics.yml.
# Handles: building stress test debuggees, preparing Helix payload, finding testhost.

steps:
- script: $(Build.SourcesDirectory)$(dir).dotnet$(dir)dotnet$(exeExt) msbuild
$(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj
/t:BuildDebuggeesOnly
/p:Configuration=$(_BuildConfig)
/p:TargetArchitecture=$(archType)
-bl:$(Build.SourcesDirectory)/artifacts/log/BuildStressDebuggees.binlog
displayName: 'Build Stress Debuggees'

- script: $(Build.SourcesDirectory)$(dir).dotnet$(dir)dotnet$(exeExt) build
$(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj
/p:PrepareHelixPayload=true
/p:Configuration=$(_BuildConfig)
/p:HelixPayloadDir=$(Build.SourcesDirectory)/artifacts/helixPayload/cdac-stress
-bl:$(Build.SourcesDirectory)/artifacts/log/StressTestPayload.binlog
displayName: 'Prepare Stress Test Helix Payload'

- pwsh: |
$testhostDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*-$(osGroup)-*-$(archType)" | Select-Object -First 1 -ExpandProperty FullName
if (-not $testhostDir) {
Write-Error "No testhost directory found"
exit 1
}
Write-Host "TestHost root: $testhostDir"
Write-Host "##vso[task.setvariable variable=StressTestHostRootDir]$testhostDir"

# Diagnostic: list mscordaccore* files in the testhost shared framework dir
# and in artifacts/bin/coreclr/* so we can see whether the cDAC reader
# was built and copied into the test payload.
$sharedDir = Get-ChildItem -Directory -Path "$testhostDir/shared/Microsoft.NETCore.App/*" | Select-Object -First 1 -ExpandProperty FullName
Write-Host ""
Write-Host "--- Diagnostic: mscordaccore* files in testhost ($sharedDir) ---"
Get-ChildItem -Path $sharedDir -Filter "*mscordaccore*" -ErrorAction SilentlyContinue | Select-Object Name, Length | Format-Table -AutoSize | Out-String | Write-Host
Write-Host "--- Diagnostic: mscordaccore* files in artifacts/bin/coreclr ---"
Get-ChildItem -Path "$(Build.SourcesDirectory)/artifacts/bin/coreclr" -Recurse -Filter "*mscordaccore*" -ErrorAction SilentlyContinue | Select-Object FullName, Length | Format-Table -AutoSize | Out-String | Write-Host
Write-Host ""

$queue = switch ("$(osGroup)_$(archType)") {
"windows_x64" { "$(helix_windows_x64)" }
"windows_x86" { "$(helix_windows_x64)" }
"windows_arm64" { "$(helix_windows_arm64)" }
"linux_x64" { "$(helix_linux_x64_oldest)" }
"linux_arm64" { "$(helix_linux_arm64_oldest)" }
"linux_arm" { "$(helix_linux_arm32_oldest)" }
"osx_x64" { "$(helix_macos_x64)" }
"osx_arm64" { "$(helix_macos_arm64)" }
default { Write-Error "Unsupported platform: $(osGroup)_$(archType)"; exit 1 }
}
Write-Host "Helix queue: $queue"
Write-Host "##vso[task.setvariable variable=CdacStressHelixQueue]$queue"
displayName: 'Find Stress TestHost and Helix Queue'
56 changes: 56 additions & 0 deletions eng/pipelines/runtime-diagnostics.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ parameters:
values:
- single-leg
- xplat
- name: cdacStressPlatforms
displayName: cDAC Stress Test Platforms
type: object
default:
- windows_x64
- linux_x64
- windows_arm64
- linux_arm64
Comment thread
max-charlamb marked this conversation as resolved.
Comment thread
max-charlamb marked this conversation as resolved.
Comment thread
max-charlamb marked this conversation as resolved.

resources:
repositories:
Expand Down Expand Up @@ -312,6 +320,54 @@ extends:
displayName: 'Fail on test errors'
condition: always()

#
# cDAC GC Stress Tests — runs in-process cDAC vs runtime stack-ref
# verification at GC stress points. Independent stage with its own build
# so its status/failures don't get conflated with the dump tests.
Comment thread
max-charlamb marked this conversation as resolved.
Comment thread
max-charlamb marked this conversation as resolved.
#
- ${{ if ne(variables['Build.Reason'], 'Schedule') }}:
- stage: CdacStressTests
dependsOn: []
jobs:
- template: /eng/pipelines/common/platform-matrix.yml
parameters:
jobTemplate: /eng/pipelines/common/global-build-job.yml
buildConfig: release
platforms: ${{ parameters.cdacStressPlatforms }}
shouldContinueOnError: true
jobParameters:
nameSuffix: CdacStressTest
buildArgs: -s clr+libs+tools.cdac+tools.cdacstresstests -c $(_BuildConfig) -rc checked -lc $(_BuildConfig)
timeoutInMinutes: 180
Comment thread
max-charlamb marked this conversation as resolved.
postBuildSteps:
- template: /eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml
- template: /eng/pipelines/common/templates/runtimes/send-to-helix-inner-step.yml
parameters:
displayName: 'Send cDAC Stress Tests to Helix'
sendParams: $(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/cdac-stress-helix.proj /t:Test /p:TargetOS=$(osGroup) /p:TargetArchitecture=$(archType) /p:HelixTargetQueues="$(CdacStressHelixQueue)" /p:TestHostPayload=$(StressTestHostRootDir) /p:StressTestsPayload=$(Build.SourcesDirectory)/artifacts/helixPayload/cdac-stress /bl:$(Build.SourcesDirectory)/artifacts/log/SendStressToHelix.binlog
environment:
_Creator: dotnet-bot
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
NUGET_PACKAGES: $(Build.SourcesDirectory)$(dir).packages
- pwsh: |
if ("$(Agent.JobStatus)" -ne "Succeeded") {
Write-Error "One or more cDAC stress test failures were detected. Failing the job."
exit 1
}
displayName: 'Fail on test errors'
condition: always()
# On failure, publish the binaries needed to symbolicate the
# core dumps Helix collects automatically. Without these the
# dumps are unreadable -- libcoreclr.so, mscordaccore_universal,
# corerun and their .dbg/.pdb side files are required.
- task: PublishPipelineArtifact@1
inputs:
targetPath: '$(StressTestHostRootDir)'
artifactName: 'TestHost_CdacStress_$(osGroup)$(osSubgroup)_$(archType)_$(_BuildConfig)_Attempt$(System.JobAttempt)'
displayName: 'Publish TestHost for crash dump symbolication'
continueOnError: true
condition: failed()

#
# cDAC X-Plat Dump Generation and Testing — Two-stage flow:
# 1. Generate dumps on each platform via Helix, download and publish as artifacts
Expand Down
3 changes: 1 addition & 2 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -749,8 +749,7 @@ CONFIG_STRING_INFO(INTERNAL_PrestubHalt, W("PrestubHalt"), "")
RETAIL_CONFIG_STRING_INFO(EXTERNAL_RestrictedGCStressExe, W("RestrictedGCStressExe"), "")
RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStressFailFast, W("CdacStressFailFast"), 0, "If nonzero, assert on cDAC/runtime GC ref mismatch during cDAC stress verification.")
RETAIL_CONFIG_STRING_INFO(INTERNAL_CdacStressLogFile, W("CdacStressLogFile"), "Log file path for cDAC stress verification results.")
RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStressStep, W("CdacStressStep"), 1, "Verify every Nth cDAC stress point (1=every point, 100=every 100th). Reduces overhead while maintaining code path diversity.")
RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStress, W("CdacStress"), 0, "Enable cDAC stress verification. Bit flags: 0x1=alloc points, 0x2=GC trigger points, 0x4=instruction points, 0x10=compare GC refs, 0x20=compare stack walk, 0x40=also use legacy DAC, 0x100=unique stacks only.")
RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStress, W("CdacStress"), 0, "Enable cDAC stress verification. Bit flags: 0x1=alloc points, 0x200=verbose per-ref diagnostics.")
CONFIG_DWORD_INFO(INTERNAL_ReturnSourceTypeForTesting, W("ReturnSourceTypeForTesting"), 0, "Allows returning the (internal only) source type of an IL to Native mapping for debugging purposes")
RETAIL_CONFIG_DWORD_INFO(UNSUPPORTED_RSStressLog, W("RSStressLog"), 0, "Allows turning on logging for RS startup")
CONFIG_DWORD_INFO(INTERNAL_SBDumpOnNewIndex, W("SBDumpOnNewIndex"), 0, "Used for Syncblock debugging. It's been a while since any of those have been used.")
Expand Down
6 changes: 6 additions & 0 deletions src/coreclr/inc/dacprivate.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ enum
DACSTACKPRIV_REQUEST_FRAME_DATA = 0xf0000000
};

// Private requests for the cDAC stress harness.
enum
{
DACSTRESSPRIV_REQUEST_FLUSH_TARGET_STATE = 0xf2000000
};

enum DacpObjectType { OBJ_STRING=0,OBJ_FREE,OBJ_OBJECT,OBJ_ARRAY,OBJ_OTHER };
struct MSLAYOUT DacpObjectData
{
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/inc/switches.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@
#define HAVE_GCCOVER
#endif

#if defined(_DEBUG)
#define CDAC_STRESS
#endif
Comment thread
max-charlamb marked this conversation as resolved.
Comment thread
max-charlamb marked this conversation as resolved.

// Some platforms may see spurious AVs when GcCoverage is enabled because of races.
// Enable further processing to see if they recur.
#if defined(HAVE_GCCOVER) && (defined(TARGET_X86) || defined(TARGET_AMD64)) && !defined(TARGET_UNIX)
Expand Down
Loading
Loading