diff --git a/docs/design/datacontracts/StackWalk.md b/docs/design/datacontracts/StackWalk.md index 923023a6026036..9ef6d502334c2e 100644 --- a/docs/design/datacontracts/StackWalk.md +++ b/docs/design/datacontracts/StackWalk.md @@ -628,62 +628,24 @@ At each frame yielded by `Filter`, the walk determines whether to scan for GC re See [GCRefMap Format and Resolution](#gcrefmap-format-and-resolution) for the GCRefMap scanning path and [Signature-Based Scanning](#signature-based-scanning) for the signature decoding path. -### Signature-Based Scanning +### Signature-Based Scanning (currently deferred) -When a transition frame's calling convention is not described by a precomputed GCRefMap (`PrestubMethodFrame`, `CallCountingHelperFrame`, and the fallback path for `StubDispatchFrame`/`ExternalMethodFrame`), the GC reference walk classifies caller-stack arguments by decoding the callee's method signature. This corresponds to native `TransitionFrame::PromoteCallerStack` (`src/coreclr/vm/frames.cpp`). +When a transition frame's calling convention is not described by a precomputed GCRefMap (`PrestubMethodFrame`, `CallCountingHelperFrame`, and the fallback path for `StubDispatchFrame`/`ExternalMethodFrame`), the native runtime classifies caller-stack arguments by decoding the callee's method signature (`TransitionFrame::PromoteCallerStack` in `src/coreclr/vm/frames.cpp`). -#### GcSignatureTypeProvider - -`GcSignatureTypeProvider` is an `IRuntimeSignatureTypeProvider` that classifies each parameter type into one of: +The cDAC does **not** currently port this scan. `GcScanner.PromoteCallerStack` is a stub that records the frame as deferred and returns without enumerating any refs: ```csharp -internal enum GcTypeKind +private static void PromoteCallerStack(TargetPointer frameAddress, GcScanContext scanContext) { - None, // Non-GC primitive that fits in a single slot - Ref, // Object reference (TYPE_GC_REF) - Interior, // Managed pointer / byref (TYPE_GC_BYREF) - Other, // Value type that may contain GC refs, or any type larger than a slot + scanContext.RecordDeferredFrame(frameAddress); } ``` -The provider is scoped to the method's containing module (captured at construction) so that `TypeDef` and `TypeRef` tokens can be resolved to a loaded `MethodTable` via the module's `TypeDefToMethodTable` / `TypeRefToMethodTable` lookup tables. The decoder's generic context is a `GcSignatureContext(TypeHandle classContext, MethodDescHandle methodContext)` carrying the method's class and method instantiations. - -The provider classifies primitives directly (`String`/`Object` -> `Ref`, `TypedReference` -> `Other`, others -> `None`). For `TypeDef`/`TypeRef` it resolves the loaded `TypeHandle` and classifies via `RuntimeTypeSystem.GetSignatureCorElementType`, treating enums (`IsEnum`) as their underlying primitive (`None`). When the type cannot be resolved (e.g., not yet loaded), classification falls back to the signature's `rawTypeKind` (`ValueType` -> `Other`, otherwise `Ref`). Arrays are `Ref`, byrefs are `Interior`, raw pointers are `None`. Generic parameters (`!T`, `!!T`) are resolved against the `GcSignatureContext` (via `GetInstantiation` / `GetGenericMethodInstantiation`) and classified by their actual instantiation -- matching native `SigTypeContext`-driven `PeekElemTypeNormalized` behavior. `ELEMENT_TYPE_INTERNAL` resolves the `TypeHandle` via `RuntimeTypeSystem.GetSignatureCorElementType` and maps the `CorElementType` to a `GcTypeKind`. - -#### PromoteCallerStack Algorithm - -1. Read the `MethodDesc` pointer from the `FramedMethodFrame` and obtain a `MethodDescHandle` from `RuntimeTypeSystem`. -2. Resolve the method's `MetadataReader` via `Loader.GetModuleHandleFromModulePtr` and `EcmaMetadata.GetMetadata`. If metadata is unavailable, no caller-stack refs are reported (matches native fallback behavior). -3. Obtain the method's signature blob, matching native `MethodDesc::GetSig`: - - If `RuntimeTypeSystem.IsStoredSigMethodDesc` is true (dynamic, EEImpl, and array method descs), pin the stored signature span and pass a `BlobReader` over it to `RuntimeSignatureDecoder.DecodeMethodSignature`. - - Otherwise, look up the signature via the metadata token (`mdMethodDef`), skipping methods with a nil token (`0x06000000`). -4. Decode the signature with `RuntimeSignatureDecoder` and a `GcSignatureTypeProvider` constructed for the method's module. The `GcSignatureContext` passes the method's class and method instantiations so that `VAR`/`MVAR` placeholders resolve to their actual types. See [Signature contract](./Signature.md) for the decoder. -5. Skip varargs methods (the caller-stack layout is not described by the callee signature alone). -6. Compute the number of reserved register slots in the `TransitionBlock`: - - | Reserved Slot | Condition | - |---|---| - | `this` pointer | `MethodSignature.Header.IsInstance` | - | Return buffer | Return type is `GcTypeKind.Other` | - | Generic instantiation arg | `RuntimeTypeSystem.RequiresInstArg(methodDesc)` | - | Async continuation | `RuntimeTypeSystem.IsAsyncMethod(methodDesc)` | - | ARM64 indirect-result register (`x8`) | Target architecture is ARM64 | - -7. If `IsInstance`, report the `this` slot at position `0` (or `1` on ARM64 to skip `x8`). The slot is reported as `GC_CALL_INTERIOR` for value-type `this`, otherwise as a normal reference. -8. Walk `MethodSignature.ParameterTypes` starting at slot index = reserved slot count, advancing one slot per parameter: - - `GcTypeKind.Ref` -> report as a reference. - - `GcTypeKind.Interior` -> report with `GC_CALL_INTERIOR`. - - `GcTypeKind.Other` / `GcTypeKind.None` -> not reported (large value types are reported via the GCRefMap path when one is available; otherwise their interior refs are not visible to this scan). - -The slot address is computed using the same formula as the GCRefMap path: - -```csharp -slotAddress = transitionBlockPtr + FirstGCRefMapSlot + (position * pointerSize); -``` +`RecordDeferredFrame` (on `GcScanContext`) appends a sentinel `StackRefData` entry with `Flags = GcScanFlags.CDAC_DEFERRED_FRAME (0x40000000)` and `Source = frameAddress`. The sentinel has no real GC ref payload; downstream consumers (e.g. the cDAC stress harness in `src/coreclr/vm/cdacstress.cpp`) can detect it and treat the missing refs at that frame as expected gaps rather than cDAC bugs. See [tests/StressTests/known-issues.md](../../../src/native/managed/cdac/tests/StressTests/known-issues.md) for the stress framework's handling and the tracking work to re-enable the scan. -#### Limitations vs. Native +The `GcSignatureTypeProvider` class remains in the tree as the scaffolding the eventual port will use; it has no callers while `PromoteCallerStack` is stubbed. -This signature-based scan has known gaps relative to native see [dotnet/runtime#127765](https://github.com/dotnet/runtime/issues/127765) for tracking. +Tracking work to re-enable the scan: it requires porting `ArgIterator` behind an `ICallingConvention` contract. Once that lands, `PromoteCallerStack` will fan out into the signature-decoding algorithm (reserved-slot computation, signature walk, slot reporting) that mirrors the native version. See also [dotnet/runtime#127765](https://github.com/dotnet/runtime/issues/127765). ### GCRefMap Format and Resolution diff --git a/eng/Subsets.props b/eng/Subsets.props index 5033fa1bb0f124..b753c4422c1d55 100644 --- a/eng/Subsets.props +++ b/eng/Subsets.props @@ -254,6 +254,7 @@ + @@ -536,6 +537,10 @@ + + + + diff --git a/eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml b/eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml new file mode 100644 index 00000000000000..ef611a3807f821 --- /dev/null +++ b/eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml @@ -0,0 +1,56 @@ +# prepare-cdac-stress-helix-steps.yml - Steps for preparing cDAC stress test Helix payloads. +# +# Used by CdacStressTests stage in runtime-diagnostics.yml. +# Handles: building stress test debuggees, preparing Helix payload, finding testhost. + +steps: +- script: $(Build.SourcesDirectory)$(dir).dotnet$(dir)dotnet$(exeExt) msbuild + $(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj + /t:BuildDebuggeesOnly + /p:Configuration=$(_BuildConfig) + /p:TargetArchitecture=$(archType) + -bl:$(Build.SourcesDirectory)/artifacts/log/BuildStressDebuggees.binlog + displayName: 'Build Stress Debuggees' + +- script: $(Build.SourcesDirectory)$(dir).dotnet$(dir)dotnet$(exeExt) build + $(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj + /p:PrepareHelixPayload=true + /p:Configuration=$(_BuildConfig) + /p:HelixPayloadDir=$(Build.SourcesDirectory)/artifacts/helixPayload/cdac-stress + -bl:$(Build.SourcesDirectory)/artifacts/log/StressTestPayload.binlog + displayName: 'Prepare Stress Test Helix Payload' + +- pwsh: | + $testhostDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*-$(osGroup)-*-$(archType)" | Select-Object -First 1 -ExpandProperty FullName + if (-not $testhostDir) { + Write-Error "No testhost directory found" + exit 1 + } + Write-Host "TestHost root: $testhostDir" + Write-Host "##vso[task.setvariable variable=StressTestHostRootDir]$testhostDir" + + # Diagnostic: list mscordaccore* files in the testhost shared framework dir + # and in artifacts/bin/coreclr/* so we can see whether the cDAC reader + # was built and copied into the test payload. + $sharedDir = Get-ChildItem -Directory -Path "$testhostDir/shared/Microsoft.NETCore.App/*" | Select-Object -First 1 -ExpandProperty FullName + Write-Host "" + Write-Host "--- Diagnostic: mscordaccore* files in testhost ($sharedDir) ---" + Get-ChildItem -Path $sharedDir -Filter "*mscordaccore*" -ErrorAction SilentlyContinue | Select-Object Name, Length | Format-Table -AutoSize | Out-String | Write-Host + Write-Host "--- Diagnostic: mscordaccore* files in artifacts/bin/coreclr ---" + Get-ChildItem -Path "$(Build.SourcesDirectory)/artifacts/bin/coreclr" -Recurse -Filter "*mscordaccore*" -ErrorAction SilentlyContinue | Select-Object FullName, Length | Format-Table -AutoSize | Out-String | Write-Host + Write-Host "" + + $queue = switch ("$(osGroup)_$(archType)") { + "windows_x64" { "$(helix_windows_x64)" } + "windows_x86" { "$(helix_windows_x64)" } + "windows_arm64" { "$(helix_windows_arm64)" } + "linux_x64" { "$(helix_linux_x64_oldest)" } + "linux_arm64" { "$(helix_linux_arm64_oldest)" } + "linux_arm" { "$(helix_linux_arm32_oldest)" } + "osx_x64" { "$(helix_macos_x64)" } + "osx_arm64" { "$(helix_macos_arm64)" } + default { Write-Error "Unsupported platform: $(osGroup)_$(archType)"; exit 1 } + } + Write-Host "Helix queue: $queue" + Write-Host "##vso[task.setvariable variable=CdacStressHelixQueue]$queue" + displayName: 'Find Stress TestHost and Helix Queue' diff --git a/eng/pipelines/runtime-diagnostics.yml b/eng/pipelines/runtime-diagnostics.yml index ca38617222003e..f13e62a096b366 100644 --- a/eng/pipelines/runtime-diagnostics.yml +++ b/eng/pipelines/runtime-diagnostics.yml @@ -37,6 +37,14 @@ parameters: values: - single-leg - xplat +- name: cdacStressPlatforms + displayName: cDAC Stress Test Platforms + type: object + default: + - windows_x64 + - linux_x64 + - windows_arm64 + - linux_arm64 resources: repositories: @@ -312,6 +320,54 @@ extends: displayName: 'Fail on test errors' condition: always() + # + # cDAC GC Stress Tests — runs in-process cDAC vs runtime stack-ref + # verification at GC stress points. Independent stage with its own build + # so its status/failures don't get conflated with the dump tests. + # + - ${{ if ne(variables['Build.Reason'], 'Schedule') }}: + - stage: CdacStressTests + dependsOn: [] + jobs: + - template: /eng/pipelines/common/platform-matrix.yml + parameters: + jobTemplate: /eng/pipelines/common/global-build-job.yml + buildConfig: release + platforms: ${{ parameters.cdacStressPlatforms }} + shouldContinueOnError: true + jobParameters: + nameSuffix: CdacStressTest + buildArgs: -s clr+libs+tools.cdac+tools.cdacstresstests -c $(_BuildConfig) -rc checked -lc $(_BuildConfig) + timeoutInMinutes: 180 + postBuildSteps: + - template: /eng/pipelines/cdac/prepare-cdac-stress-helix-steps.yml + - template: /eng/pipelines/common/templates/runtimes/send-to-helix-inner-step.yml + parameters: + displayName: 'Send cDAC Stress Tests to Helix' + sendParams: $(Build.SourcesDirectory)/src/native/managed/cdac/tests/StressTests/cdac-stress-helix.proj /t:Test /p:TargetOS=$(osGroup) /p:TargetArchitecture=$(archType) /p:HelixTargetQueues="$(CdacStressHelixQueue)" /p:TestHostPayload=$(StressTestHostRootDir) /p:StressTestsPayload=$(Build.SourcesDirectory)/artifacts/helixPayload/cdac-stress /bl:$(Build.SourcesDirectory)/artifacts/log/SendStressToHelix.binlog + environment: + _Creator: dotnet-bot + SYSTEM_ACCESSTOKEN: $(System.AccessToken) + NUGET_PACKAGES: $(Build.SourcesDirectory)$(dir).packages + - pwsh: | + if ("$(Agent.JobStatus)" -ne "Succeeded") { + Write-Error "One or more cDAC stress test failures were detected. Failing the job." + exit 1 + } + displayName: 'Fail on test errors' + condition: always() + # On failure, publish the binaries needed to symbolicate the + # core dumps Helix collects automatically. Without these the + # dumps are unreadable -- libcoreclr.so, mscordaccore_universal, + # corerun and their .dbg/.pdb side files are required. + - task: PublishPipelineArtifact@1 + inputs: + targetPath: '$(StressTestHostRootDir)' + artifactName: 'TestHost_CdacStress_$(osGroup)$(osSubgroup)_$(archType)_$(_BuildConfig)_Attempt$(System.JobAttempt)' + displayName: 'Publish TestHost for crash dump symbolication' + continueOnError: true + condition: failed() + # # cDAC X-Plat Dump Generation and Testing — Two-stage flow: # 1. Generate dumps on each platform via Helix, download and publish as artifacts diff --git a/src/coreclr/inc/clrconfigvalues.h b/src/coreclr/inc/clrconfigvalues.h index ccc0b586789e68..68cd454496d94c 100644 --- a/src/coreclr/inc/clrconfigvalues.h +++ b/src/coreclr/inc/clrconfigvalues.h @@ -749,8 +749,7 @@ CONFIG_STRING_INFO(INTERNAL_PrestubHalt, W("PrestubHalt"), "") RETAIL_CONFIG_STRING_INFO(EXTERNAL_RestrictedGCStressExe, W("RestrictedGCStressExe"), "") RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStressFailFast, W("CdacStressFailFast"), 0, "If nonzero, assert on cDAC/runtime GC ref mismatch during cDAC stress verification.") RETAIL_CONFIG_STRING_INFO(INTERNAL_CdacStressLogFile, W("CdacStressLogFile"), "Log file path for cDAC stress verification results.") -RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStressStep, W("CdacStressStep"), 1, "Verify every Nth cDAC stress point (1=every point, 100=every 100th). Reduces overhead while maintaining code path diversity.") -RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStress, W("CdacStress"), 0, "Enable cDAC stress verification. Bit flags: 0x1=alloc points, 0x2=GC trigger points, 0x4=instruction points, 0x10=compare GC refs, 0x20=compare stack walk, 0x40=also use legacy DAC, 0x100=unique stacks only.") +RETAIL_CONFIG_DWORD_INFO(INTERNAL_CdacStress, W("CdacStress"), 0, "Enable cDAC stress verification. Bit flags: 0x1=alloc points, 0x200=verbose per-ref diagnostics.") CONFIG_DWORD_INFO(INTERNAL_ReturnSourceTypeForTesting, W("ReturnSourceTypeForTesting"), 0, "Allows returning the (internal only) source type of an IL to Native mapping for debugging purposes") RETAIL_CONFIG_DWORD_INFO(UNSUPPORTED_RSStressLog, W("RSStressLog"), 0, "Allows turning on logging for RS startup") CONFIG_DWORD_INFO(INTERNAL_SBDumpOnNewIndex, W("SBDumpOnNewIndex"), 0, "Used for Syncblock debugging. It's been a while since any of those have been used.") diff --git a/src/coreclr/inc/dacprivate.h b/src/coreclr/inc/dacprivate.h index 95209e0d03ead9..19453dc8608663 100644 --- a/src/coreclr/inc/dacprivate.h +++ b/src/coreclr/inc/dacprivate.h @@ -65,6 +65,12 @@ enum DACSTACKPRIV_REQUEST_FRAME_DATA = 0xf0000000 }; +// Private requests for the cDAC stress harness. +enum +{ + DACSTRESSPRIV_REQUEST_FLUSH_TARGET_STATE = 0xf2000000 +}; + enum DacpObjectType { OBJ_STRING=0,OBJ_FREE,OBJ_OBJECT,OBJ_ARRAY,OBJ_OTHER }; struct MSLAYOUT DacpObjectData { diff --git a/src/coreclr/inc/switches.h b/src/coreclr/inc/switches.h index 447f1727ae7910..38e33513516be2 100644 --- a/src/coreclr/inc/switches.h +++ b/src/coreclr/inc/switches.h @@ -84,6 +84,10 @@ #define HAVE_GCCOVER #endif +#if defined(_DEBUG) +#define CDAC_STRESS +#endif + // Some platforms may see spurious AVs when GcCoverage is enabled because of races. // Enable further processing to see if they recur. #if defined(HAVE_GCCOVER) && (defined(TARGET_X86) || defined(TARGET_AMD64)) && !defined(TARGET_UNIX) diff --git a/src/coreclr/vm/cdacstress.cpp b/src/coreclr/vm/cdacstress.cpp index b750c069891af0..39f2715d2576f8 100644 --- a/src/coreclr/vm/cdacstress.cpp +++ b/src/coreclr/vm/cdacstress.cpp @@ -4,19 +4,19 @@ // // CdacStress.cpp // -// Implements in-process cDAC loading and stack reference verification. -// Enabled via DOTNET_CdacStress (bit flags) or legacy DOTNET_GCStress=0x20. -// At each enabled stress point we: -// 1. Ask the cDAC to enumerate stack GC references via ISOSDacInterface::GetStackReferences -// 2. Ask the runtime to enumerate stack GC references via StackWalkFrames + GcInfoDecoder -// 3. Compare the two sets and report any mismatches +// At each enabled stress point, asks the cDAC and the runtime to enumerate +// the current thread's stack GC refs and compares them. The runtime's own +// GC root enumeration (what the collector actually consumes) is the oracle. +// +// Enabled via DOTNET_CdacStress. // #include "common.h" -#ifdef HAVE_GCCOVER +#ifdef CDAC_STRESS #include "cdacstress.h" +#include "dacprivate.h" #include "../../native/managed/cdac/inc/cdac_reader.h" #include "../../debug/datadescriptor-shared/inc/contract-descriptor.h" #include @@ -27,12 +27,48 @@ #include "sstring.h" #include "exinfo.h" -// Forward-declare the 3-param GcEnumObject used as a GCEnumCallback. -// Defined in gcenv.ee.common.cpp; not exposed in any header. -extern void GcEnumObject(LPVOID pData, OBJECTREF *pObj, uint32_t flags); +#ifdef TARGET_LINUX +// process_vm_readv is the safe in-process read path on Linux. See +// ReadFromTargetCallback below for why PAL_TRY around memcpy is not viable. +#include +#include +#endif + +//----------------------------------------------------------------------------- +// Constants and configuration +//----------------------------------------------------------------------------- #define CDAC_LIB_NAME MAKEDLLNAME_W(W("mscordaccore_universal")) +// Sentinel flag set on cDAC StackRefData entries by RecordDeferredFrame to +// mark a frame whose ref scan was intentionally skipped (e.g. PromoteCallerStack +// pending the ArgIterator port). Mirrors GcScanFlags.CDAC_DEFERRED_FRAME. +static const unsigned int CDAC_DEFERRED_FRAME = 0x40000000; +static const int MAX_DEFERRED_FRAMES = 64; + +// Bit flags for DOTNET_CdacStress configuration. +enum CdacStressFlags : DWORD +{ + // Trigger points (where stress fires) + CDACSTRESS_ALLOC = 0x1, // Verify at allocation points + + // Modifiers + CDACSTRESS_VERBOSE = 0x200, // Rich per-ref diagnostics in the log +}; + +//----------------------------------------------------------------------------- +// Types +//----------------------------------------------------------------------------- + +// Identifies which collector produced a ref. Lets the logger derive its +// own side label (no need to thread "cDAC"/"RT" strings down through the +// comparison code). +enum RefSide : uint8_t +{ + SIDE_CDAC = 0, + SIDE_RT = 1, +}; + // Represents a single GC stack reference for comparison purposes. struct StackRef { @@ -41,57 +77,169 @@ struct StackRef unsigned int Flags; // SOSRefFlags (interior, pinned) CLRDATA_ADDRESS Source; // IP or Frame that owns this ref int SourceType; // SOS_StackSourceIP or SOS_StackSourceFrame - int Register; // Register number (cDAC only) + int Register; // Processor-encoding reg number, -1 for stack slots + // (cDAC populates from GcInfo; runtime populates + // by inverting GetRegisterSlot on supported arches) int Offset; // Register offset (cDAC only) CLRDATA_ADDRESS StackPointer; // Stack pointer at this ref (cDAC only) + RefSide Side; // Producer of this ref (cDAC vs runtime) }; -// Fixed-size buffer for collecting refs during stack walk. -// No heap allocation inside the promote callback — we're under NOTHROW contracts. -static const int MAX_COLLECTED_REFS = 4096; +static int IdentifyRegisterFromPpObj(REGDISPLAY* pRD, void* ppObj) +{ +#if defined(FEATURE_NATIVEAOT) + (void)pRD; (void)ppObj; + return -1; +#else + if (pRD == nullptr || pRD->pCurrentContextPointers == nullptr) + return -1; + KNONVOLATILE_CONTEXT_POINTERS* p = pRD->pCurrentContextPointers; + +#if defined(TARGET_AMD64) + PDWORD64* slots = (PDWORD64*)&p->Rax; + for (int r = 0; r < 16; r++) + { + if (r == 4) continue; // rsp + if ((void*)slots[r] == ppObj) + return r; + } +#elif defined(TARGET_ARM64) + // gcinfo encoding for ARM64: X0..X28 = 0..28, FP = 29, LR = 30, SP = 31. + // pCurrentContextPointers exposes only callee-saved (X19..X28, Fp, Lr). + if ((void*)p->X19 == ppObj) return 19; + if ((void*)p->X20 == ppObj) return 20; + if ((void*)p->X21 == ppObj) return 21; + if ((void*)p->X22 == ppObj) return 22; + if ((void*)p->X23 == ppObj) return 23; + if ((void*)p->X24 == ppObj) return 24; + if ((void*)p->X25 == ppObj) return 25; + if ((void*)p->X26 == ppObj) return 26; + if ((void*)p->X27 == ppObj) return 27; + if ((void*)p->X28 == ppObj) return 28; + if ((void*)p->Fp == ppObj) return 29; + if ((void*)p->Lr == ppObj) return 30; +#elif defined(TARGET_ARM) + // gcinfo encoding for ARM: R0..R12 = 0..12, SP = 13, LR = 14, PC = 15. + // pCurrentContextPointers exposes only callee-saved (R4..R11, Lr). + if ((void*)p->R4 == ppObj) return 4; + if ((void*)p->R5 == ppObj) return 5; + if ((void*)p->R6 == ppObj) return 6; + if ((void*)p->R7 == ppObj) return 7; + if ((void*)p->R8 == ppObj) return 8; + if ((void*)p->R9 == ppObj) return 9; + if ((void*)p->R10 == ppObj) return 10; + if ((void*)p->R11 == ppObj) return 11; + if ((void*)p->Lr == ppObj) return 14; +#elif defined(TARGET_X86) + // gcinfo encoding for x86: EAX=0, ECX=1, EDX=2, EBX=3, ESP=4, EBP=5, ESI=6, EDI=7. + if ((void*)p->Eax == ppObj) return 0; + if ((void*)p->Ecx == ppObj) return 1; + if ((void*)p->Edx == ppObj) return 2; + if ((void*)p->Ebx == ppObj) return 3; + if ((void*)p->Ebp == ppObj) return 5; + if ((void*)p->Esi == ppObj) return 6; + if ((void*)p->Edi == ppObj) return 7; +#elif defined(TARGET_LOONGARCH64) + // gcinfo encoding for LoongArch64: Ra=1, Fp=22, S0..S8 = 23..31 + // (see GetRegName in src/coreclr/gcdump/gcdumpnonx86.cpp). + if ((void*)p->Ra == ppObj) return 1; + if ((void*)p->Fp == ppObj) return 22; + if ((void*)p->S0 == ppObj) return 23; + if ((void*)p->S1 == ppObj) return 24; + if ((void*)p->S2 == ppObj) return 25; + if ((void*)p->S3 == ppObj) return 26; + if ((void*)p->S4 == ppObj) return 27; + if ((void*)p->S5 == ppObj) return 28; + if ((void*)p->S6 == ppObj) return 29; + if ((void*)p->S7 == ppObj) return 30; + if ((void*)p->S8 == ppObj) return 31; +#elif defined(TARGET_RISCV64) + // gcinfo encoding for RISCV64: Ra=1, Gp=3, Tp=4, Fp=8, S1=9, S2..S11 = 18..27 + // (see GetRegName in src/coreclr/gcdump/gcdumpnonx86.cpp). + if ((void*)p->Ra == ppObj) return 1; + if ((void*)p->Gp == ppObj) return 3; + if ((void*)p->Tp == ppObj) return 4; + if ((void*)p->Fp == ppObj) return 8; + if ((void*)p->S1 == ppObj) return 9; + if ((void*)p->S2 == ppObj) return 18; + if ((void*)p->S3 == ppObj) return 19; + if ((void*)p->S4 == ppObj) return 20; + if ((void*)p->S5 == ppObj) return 21; + if ((void*)p->S6 == ppObj) return 22; + if ((void*)p->S7 == ppObj) return 23; + if ((void*)p->S8 == ppObj) return 24; + if ((void*)p->S9 == ppObj) return 25; + if ((void*)p->S10 == ppObj) return 26; + if ((void*)p->S11 == ppObj) return 27; +#endif + return -1; +#endif // !FEATURE_NATIVEAOT +} + +//----------------------------------------------------------------------------- +// External symbols +//----------------------------------------------------------------------------- + +// Contract descriptor symbol exported from coreclr (consumed by the cDAC). +extern "C" struct ContractDescriptor DotNetRuntimeContractDescriptor; + +// 3-param GcEnumObject used as a GCEnumCallback. +// Defined in gcenv.ee.common.cpp; not exposed in any header. +extern void GcEnumObject(LPVOID pData, OBJECTREF *pObj, uint32_t flags); + +//----------------------------------------------------------------------------- +// Forward declarations +//----------------------------------------------------------------------------- + +static bool IsDeferredFrame(CLRDATA_ADDRESS source, const CLRDATA_ADDRESS* deferred, int deferredCount); +static void ResolveMethodName(CLRDATA_ADDRESS source, int sourceType, char* buf, int bufLen); + +//----------------------------------------------------------------------------- +// Static state — cDAC reader +//----------------------------------------------------------------------------- -// Static state — cDAC static HMODULE s_cdacModule = NULL; static intptr_t s_cdacHandle = 0; static IUnknown* s_cdacSosInterface = nullptr; static IXCLRDataProcess* s_cdacProcess = nullptr; // Cached QI result for Flush() static ISOSDacInterface* s_cdacSosDac = nullptr; // Cached QI result for GetStackReferences() -// Static state — legacy DAC (for three-way comparison) -static HMODULE s_dacModule = NULL; -static ISOSDacInterface* s_dacSosDac = nullptr; -static IXCLRDataProcess* s_dacProcess = nullptr; +//----------------------------------------------------------------------------- +// Static state — framework +//----------------------------------------------------------------------------- -// Static state — common static bool s_initialized = false; static bool s_failFast = true; -static DWORD s_step = 1; // Verify every Nth stress point (1=every point) static DWORD s_cdacStressLevel = 0; // Resolved CdacStressFlags static FILE* s_logFile = nullptr; static CrstStatic s_cdacLock; // Serializes cDAC access from concurrent GC stress threads -// Unique-stack filtering: hash set of previously seen stack traces. -// Protected by s_cdacLock (already held during VerifyAtStressPoint). +//----------------------------------------------------------------------------- +// Static state — verification counters (reported at shutdown) +//----------------------------------------------------------------------------- -static SHash>>* s_seenStacks = nullptr; +// Verification outcome counters. (Pass + Fail + KnownIssue) is the total +// number of stress points the harness ran to completion. +static volatile LONG s_passCount = 0; +static volatile LONG s_failCount = 0; +static volatile LONG s_knownIssueCount = 0; -// Thread-local reentrancy guard — prevents infinite recursion when -// allocations inside VerifyAtStressPoint trigger VerifyAtAllocPoint. -thread_local bool t_inVerification = false; +// Frame-level counters. Updated once per frame encountered during compare. +// frameTotal = frameMatch + frameMismatch + frameKnownNie. +static volatile LONG s_frameTotal = 0; +static volatile LONG s_frameMatch = 0; +static volatile LONG s_frameMismatch = 0; +static volatile LONG s_frameKnownNie = 0; -// Verification counters (reported at shutdown) -static volatile LONG s_verifyCount = 0; -static volatile LONG s_verifyPass = 0; -static volatile LONG s_verifyFail = 0; -static volatile LONG s_verifySkip = 0; +//----------------------------------------------------------------------------- +// Thread-local state +//----------------------------------------------------------------------------- -// Thread-local storage for the current thread context at the stress point. +// Current thread context at the stress point, consumed by the cDAC's +// ReadThreadContext callback. static thread_local PCONTEXT s_currentContext = nullptr; static thread_local DWORD s_currentThreadId = 0; -// Extern declaration for the contract descriptor symbol exported from coreclr. -extern "C" struct ContractDescriptor DotNetRuntimeContractDescriptor; - //----------------------------------------------------------------------------- // In-process callbacks for the cDAC reader. // These allow the cDAC to read memory from the current process. @@ -107,6 +255,24 @@ static void ReadFromTargetHelper(void* src, uint8_t* dest, uint32_t count) static int ReadFromTargetCallback(uint64_t addr, uint8_t* dest, uint32_t count, void* context) { +#ifdef TARGET_LINUX + // On Linux the PAL signal handler refuses to dispatch hardware exceptions + // when the faulting PC is in non-runtime code (see IsSafeToHandleHardwareException + // in exceptionhandling.cpp -- only managed code, virtual stubs, and marked JIT + // helpers qualify). If the cDAC asks us to read from an invalid address, the + // memcpy below would AV inside libc's __memcpy_advsimd, the signal handler + // would bail, and the whole process would abort -- a PAL_TRY around memcpy + // cannot catch it. + // + // process_vm_readv performs the copy in the kernel, returning EFAULT for + // unmapped pages instead of raising a signal. Same pattern used by + // createdump (crashinfounix.cpp:523). + void* src = reinterpret_cast(static_cast(addr)); + iovec local = { dest, count }; + iovec remote = { src, count }; + ssize_t bytesRead = process_vm_readv(getpid(), &local, 1, &remote, 1, 0); + return (bytesRead == (ssize_t)count) ? S_OK : E_FAIL; +#else void* src = reinterpret_cast(static_cast(addr)); struct Param { void* src; uint8_t* dest; uint32_t count; } param; param.src = src; param.dest = dest; param.count = count; @@ -120,6 +286,7 @@ static int ReadFromTargetCallback(uint64_t addr, uint8_t* dest, uint32_t count, } PAL_ENDTRY return S_OK; +#endif } static int WriteToTargetCallback(uint64_t addr, const uint8_t* buff, uint32_t count, void* context) @@ -142,163 +309,93 @@ static int ReadThreadContextCallback(uint32_t threadId, uint32_t contextFlags, u return E_FAIL; } -//----------------------------------------------------------------------------- -// Minimal ICLRDataTarget implementation for loading the legacy DAC in-process. -// Routes ReadVirtual/GetThreadContext to the same callbacks as the cDAC. -//----------------------------------------------------------------------------- -class InProcessDataTarget : public ICLRDataTarget, public ICLRRuntimeLocator -{ - volatile LONG m_refCount; -public: - InProcessDataTarget() : m_refCount(1) {} - virtual ~InProcessDataTarget() = default; - - HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void** ppObj) override - { - if (riid == IID_IUnknown || riid == __uuidof(ICLRDataTarget)) - { - *ppObj = static_cast(this); - AddRef(); - return S_OK; - } - if (riid == __uuidof(ICLRRuntimeLocator)) - { - *ppObj = static_cast(this); - AddRef(); - return S_OK; - } - *ppObj = nullptr; - return E_NOINTERFACE; - } - ULONG STDMETHODCALLTYPE AddRef() override { return InterlockedIncrement(&m_refCount); } - ULONG STDMETHODCALLTYPE Release() override - { - ULONG c = InterlockedDecrement(&m_refCount); - if (c == 0) delete this; - return c; - } - - // ICLRRuntimeLocator — provides the CLR base address directly so the DAC - // does not fall back to GetImageBase (which needs GetModuleHandleW, unavailable on Linux). - HRESULT STDMETHODCALLTYPE GetRuntimeBase(CLRDATA_ADDRESS* baseAddress) override - { - *baseAddress = (CLRDATA_ADDRESS)GetCurrentModuleBase(); - return S_OK; - } - - HRESULT STDMETHODCALLTYPE GetMachineType(ULONG32* machineType) override - { -#ifdef TARGET_AMD64 - *machineType = IMAGE_FILE_MACHINE_AMD64; -#elif defined(TARGET_ARM64) - *machineType = IMAGE_FILE_MACHINE_ARM64; -#elif defined(TARGET_X86) - *machineType = IMAGE_FILE_MACHINE_I386; -#else - return E_NOTIMPL; -#endif - return S_OK; - } - - HRESULT STDMETHODCALLTYPE GetPointerSize(ULONG32* pointerSize) override - { - *pointerSize = sizeof(void*); - return S_OK; - } - - HRESULT STDMETHODCALLTYPE GetImageBase(LPCWSTR imagePath, CLRDATA_ADDRESS* baseAddress) override - { - // Not needed — the DAC uses ICLRRuntimeLocator::GetRuntimeBase() instead. - return E_NOTIMPL; - } - - HRESULT STDMETHODCALLTYPE ReadVirtual(CLRDATA_ADDRESS address, BYTE* buffer, ULONG32 bytesRequested, ULONG32* bytesRead) override - { - int hr = ReadFromTargetCallback((uint64_t)address, buffer, bytesRequested, nullptr); - if (hr == S_OK && bytesRead != nullptr) - *bytesRead = bytesRequested; - return hr; - } - - HRESULT STDMETHODCALLTYPE WriteVirtual(CLRDATA_ADDRESS, BYTE*, ULONG32, ULONG32*) override { return E_NOTIMPL; } - - HRESULT STDMETHODCALLTYPE GetTLSValue(ULONG32 threadId, ULONG32 index, CLRDATA_ADDRESS* value) override { return E_NOTIMPL; } - HRESULT STDMETHODCALLTYPE SetTLSValue(ULONG32 threadId, ULONG32 index, CLRDATA_ADDRESS value) override { return E_NOTIMPL; } - HRESULT STDMETHODCALLTYPE GetCurrentThreadID(ULONG32* threadId) override - { - *threadId = ::GetCurrentThreadId(); - return S_OK; - } - - HRESULT STDMETHODCALLTYPE GetThreadContext(ULONG32 threadId, ULONG32 contextFlags, ULONG32 contextSize, BYTE* contextBuffer) override - { - return ReadThreadContextCallback(threadId, contextFlags, contextSize, contextBuffer, nullptr); - } - - HRESULT STDMETHODCALLTYPE SetThreadContext(ULONG32, ULONG32, BYTE*) override { return E_NOTIMPL; } - HRESULT STDMETHODCALLTYPE Request(ULONG32, ULONG32, BYTE*, ULONG32, BYTE*) override { return E_NOTIMPL; } -}; - //----------------------------------------------------------------------------- // Initialization / Shutdown //----------------------------------------------------------------------------- -bool CdacStress::IsEnabled() +static bool IsCdacStressVerboseEnabled() { - // Check DOTNET_CdacStress first (new config) - DWORD cdacStress = CLRConfig::GetConfigValue(CLRConfig::INTERNAL_CdacStress); - if (cdacStress != 0) - return true; - - // Fall back to legacy DOTNET_GCStress=0x20 - return (g_pConfig->GetGCStressLevel() & EEConfig::GCSTRESS_CDAC) != 0; + return (s_cdacStressLevel & CDACSTRESS_VERBOSE) != 0; } -bool CdacStress::IsInitialized() +// Single-line file logger. Self-guards on s_logFile, so callers don't need to. +#define CDAC_LOG(...) \ + do { \ + if (s_logFile != nullptr) \ + fprintf(s_logFile, __VA_ARGS__); \ + } while (0) + +// Diagnostic emitter that always reaches stderr (and the log file when open). +// Use for init / library-load errors visible in CI. Every line is prefixed +// with "CDAC GC Stress: ". +#define CDAC_ERR(...) \ + do { \ + fprintf(stderr, "CDAC GC Stress: "); \ + fprintf(stderr, __VA_ARGS__); \ + if (s_logFile != nullptr) { \ + fprintf(s_logFile, "CDAC GC Stress: "); \ + fprintf(s_logFile, __VA_ARGS__); \ + } \ + } while (0) + +// Forward declarations for helpers defined later. Implementations live in +// the "Rendering helpers" section at the bottom of the file. +static const char* RegisterName(int reg); +static const char* FormatRefFlags(unsigned int flags, char* buf, size_t bufLen); + +// Per-ref disposition coming out of a frame compare. +enum RefDisposition : uint8_t { - return s_initialized; -} + REF_MATCHED = 0, // paired with a ref on the opposite side + REF_ONLY = 1, // present on this side, absent on the other + REF_NIE = 2, // only-side, but Source is on the deferred list + // (only meaningful for SIDE_RT) +}; -DWORD GetCdacStressLevel() -{ - return s_cdacStressLevel; -} +static const char* SideName(RefSide s); +static const char* DispositionName(RefDisposition d); +static void LogRefConcise(RefDisposition disp, const StackRef& r); +static void LogRefVerbose(RefDisposition disp, const StackRef& r); +static void LogRef(RefDisposition disp, const StackRef& r); -bool CdacStress::IsUniqueEnabled() +void CdacStressPolicy::Initialize() { - return (s_cdacStressLevel & CDACSTRESS_UNIQUE) != 0; -} + if (s_initialized) + return; + DWORD cdacStressLevel = CLRConfig::GetConfigValue(CLRConfig::INTERNAL_CdacStress); + if (cdacStressLevel == 0) + return; -bool CdacStress::Initialize() -{ - if (!IsEnabled()) - return false; - - // Resolve the stress level from DOTNET_CdacStress or legacy GCSTRESS_CDAC - DWORD cdacStress = CLRConfig::GetConfigValue(CLRConfig::INTERNAL_CdacStress); - if (cdacStress != 0) - { - s_cdacStressLevel = cdacStress; - } - else - { - // Legacy: GCSTRESS_CDAC maps to allocation-point + reference verification - s_cdacStressLevel = CDACSTRESS_ALLOC | CDACSTRESS_REFS; - } + // Record the requested stress level early so internal helpers + // (e.g. IsCdacStressVerboseEnabled) work during the rest of init. + // Triggers (CdacStress::IsEnabled) are gated by s_initialized, + // which is set only after init completes successfully. + s_cdacStressLevel = cdacStressLevel; // Load mscordaccore_universal from next to coreclr PathString path; - if (WszGetModuleFileName(reinterpret_cast(GetCurrentModuleBase()), path) == 0) + // On Unix, GetCurrentModuleBase() returns a raw dladdr base address, not a + // PAL HMODULE -- WszGetModuleFileName will return 0 for it. The DAC has + // the same problem and uses PAL_GetPalHostModule() (which is the coreclr + // host module, exactly where cdacstress.cpp lives). Mirror that pattern. +#ifdef HOST_UNIX + HMODULE hCoreclr = PAL_GetPalHostModule(); +#else + HMODULE hCoreclr = reinterpret_cast(GetCurrentModuleBase()); +#endif + if (hCoreclr == NULL || WszGetModuleFileName(hCoreclr, path) == 0) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to get module file name\n")); - return false; + CDAC_ERR("Failed to get coreclr module file name (WszGetModuleFileName returned 0).\n"); + return; } SString::Iterator iter = path.End(); if (!path.FindBack(iter, DIRECTORY_SEPARATOR_CHAR_W)) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to find directory separator\n")); - return false; + MAKE_UTF8PTR_FROMWIDE_NOTHROW(pathUtf8Sep, path.GetUnicode()); + CDAC_ERR("Failed to find directory separator in module path '%s'.\n", + pathUtf8Sep != nullptr ? pathUtf8Sep : ""); + return; } iter++; @@ -308,18 +405,21 @@ bool CdacStress::Initialize() s_cdacModule = CLRLoadLibrary(path.GetUnicode()); if (s_cdacModule == NULL) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to load %S\n", path.GetUnicode())); - return false; + MAKE_UTF8PTR_FROMWIDE_NOTHROW(pathUtf8, path.GetUnicode()); + CDAC_ERR("Failed to load cDAC library at '%s' " + "(check that mscordaccore_universal is shipped next to coreclr).\n", + pathUtf8 != nullptr ? pathUtf8 : ""); + return; } // Resolve cdac_reader_init auto init = reinterpret_cast(::GetProcAddress(s_cdacModule, "cdac_reader_init")); if (init == nullptr) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to resolve cdac_reader_init\n")); + CDAC_ERR("Failed to resolve cdac_reader_init symbol.\n"); ::FreeLibrary(s_cdacModule); s_cdacModule = NULL; - return false; + return; } // Get the address of the contract descriptor in our own process @@ -328,10 +428,11 @@ bool CdacStress::Initialize() // Initialize the cDAC reader with in-process callbacks (no alloc_virtual for in-process stress) if (init(descriptorAddr, &ReadFromTargetCallback, &WriteToTargetCallback, &ReadThreadContextCallback, nullptr, nullptr, &s_cdacHandle) != 0) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: cdac_reader_init failed\n")); + CDAC_ERR("cdac_reader_init failed (descriptorAddr=0x%llx).\n", + (unsigned long long)descriptorAddr); ::FreeLibrary(s_cdacModule); s_cdacModule = NULL; - return false; + return; } // Create the SOS interface @@ -339,48 +440,43 @@ bool CdacStress::Initialize() ::GetProcAddress(s_cdacModule, "cdac_reader_create_sos_interface")); if (createSos == nullptr) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to resolve cdac_reader_create_sos_interface\n")); + CDAC_ERR("Failed to resolve cdac_reader_create_sos_interface symbol.\n"); auto freeFn = reinterpret_cast(::GetProcAddress(s_cdacModule, "cdac_reader_free")); if (freeFn != nullptr) freeFn(s_cdacHandle); ::FreeLibrary(s_cdacModule); s_cdacModule = NULL; s_cdacHandle = 0; - return false; + return; } if (createSos(s_cdacHandle, nullptr, &s_cdacSosInterface) != 0) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: cdac_reader_create_sos_interface failed\n")); + CDAC_ERR("cdac_reader_create_sos_interface failed.\n"); auto freeFn = reinterpret_cast(::GetProcAddress(s_cdacModule, "cdac_reader_free")); if (freeFn != nullptr) freeFn(s_cdacHandle); ::FreeLibrary(s_cdacModule); s_cdacModule = NULL; s_cdacHandle = 0; - return false; + return; } // Read configuration for fail-fast behavior s_failFast = CLRConfig::GetConfigValue(CLRConfig::INTERNAL_CdacStressFailFast) != 0; - // Read step interval for throttling verifications - s_step = CLRConfig::GetConfigValue(CLRConfig::INTERNAL_CdacStressStep); - if (s_step == 0) - s_step = 1; - // Cache QI results so we don't QI on every stress point { HRESULT hr = s_cdacSosInterface->QueryInterface(__uuidof(IXCLRDataProcess), reinterpret_cast(&s_cdacProcess)); if (FAILED(hr) || s_cdacProcess == nullptr) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to QI for IXCLRDataProcess (hr=0x%08x)\n", hr)); + CDAC_ERR("Failed to QI for IXCLRDataProcess (hr=0x%08x)\n", hr); } hr = s_cdacSosInterface->QueryInterface(__uuidof(ISOSDacInterface), reinterpret_cast(&s_cdacSosDac)); if (FAILED(hr) || s_cdacSosDac == nullptr) { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Failed to QI for ISOSDacInterface (hr=0x%08x) - cannot verify\n", hr)); + CDAC_ERR("Failed to QI for ISOSDacInterface (hr=0x%08x) - cannot verify\n", hr); if (s_cdacProcess != nullptr) { s_cdacProcess->Release(); @@ -392,7 +488,7 @@ bool CdacStress::Initialize() ::FreeLibrary(s_cdacModule); s_cdacModule = NULL; s_cdacHandle = 0; - return false; + return; } } @@ -405,92 +501,52 @@ bool CdacStress::Initialize() if (s_logFile != nullptr) { fprintf(s_logFile, "=== cDAC GC Stress Verification Log ===\n"); - fprintf(s_logFile, "FailFast: %s\n", s_failFast ? "true" : "false"); - fprintf(s_logFile, "Step: %u (verify every %u stress points)\n\n", s_step, s_step); + fprintf(s_logFile, "FailFast: %s\n\n", s_failFast ? "true" : "false"); } - } - - s_cdacLock.Init(CrstGCCover, CRST_DEFAULT); - - if (IsUniqueEnabled()) - { - s_seenStacks = new SHash>>(); - } - - // Load the legacy DAC for three-way comparison (optional — non-fatal if it fails). - { - PathString dacPath; - if (WszGetModuleFileName(reinterpret_cast(GetCurrentModuleBase()), dacPath) != 0) + else { - SString::Iterator dacIter = dacPath.End(); - if (dacPath.FindBack(dacIter, DIRECTORY_SEPARATOR_CHAR_W)) - { - dacIter++; - dacPath.Truncate(dacIter); - dacPath.Append(W("mscordaccore.dll")); - - s_dacModule = CLRLoadLibrary(dacPath.GetUnicode()); - if (s_dacModule != NULL) - { - typedef HRESULT (STDAPICALLTYPE *PFN_CLRDataCreateInstance)(REFIID, ICLRDataTarget*, void**); - auto pfnCreate = reinterpret_cast( - ::GetProcAddress(s_dacModule, "CLRDataCreateInstance")); - if (pfnCreate != nullptr) - { - InProcessDataTarget* pTarget = new (nothrow) InProcessDataTarget(); - if (pTarget != nullptr) - { - IUnknown* pDacUnk = nullptr; - HRESULT hr = pfnCreate(__uuidof(IUnknown), pTarget, (void**)&pDacUnk); - pTarget->Release(); - if (SUCCEEDED(hr) && pDacUnk != nullptr) - { - pDacUnk->QueryInterface(__uuidof(ISOSDacInterface), (void**)&s_dacSosDac); - pDacUnk->QueryInterface(__uuidof(IXCLRDataProcess), (void**)&s_dacProcess); - pDacUnk->Release(); - } - } - } - if (s_dacSosDac == nullptr) - { - LOG((LF_GCROOTS, LL_WARNING, "CDAC GC Stress: Legacy DAC loaded but QI for ISOSDacInterface failed\n")); - } - } - else - { - LOG((LF_GCROOTS, LL_INFO10, "CDAC GC Stress: Legacy DAC not found (three-way comparison disabled)\n")); - } - } + CDAC_ERR("Failed to open log file '%s' (errno may indicate missing directory).\n", + sLogPath.GetUTF8()); } } + s_cdacLock.Init(CrstGCCover, CRST_DEFAULT); + + // Activate triggers only after everything is fully initialized. s_initialized = true; - LOG((LF_GCROOTS, LL_INFO10, "CDAC GC Stress: Initialized successfully (failFast=%d, logFile=%s)\n", - s_failFast, s_logFile != nullptr ? "yes" : "no")); - return true; } -void CdacStress::Shutdown() +void CdacStressPolicy::Shutdown() { if (!s_initialized) return; // Print summary to stderr so results are always visible - LONG actualVerifications = s_verifyPass + s_verifyFail + s_verifySkip; - fprintf(stderr, "CDAC GC Stress: %ld stress points, %ld verifications (%ld pass / %ld fail, %ld skipped)\n", - (long)s_verifyCount, (long)actualVerifications, (long)s_verifyPass, (long)s_verifyFail, (long)s_verifySkip); + LONG totalVerifications = s_passCount + s_failCount + s_knownIssueCount; + fprintf(stderr, + "CDAC GC Stress: %ld verifications " + "(%ld pass / %ld fail / %ld known-issue)\n", + (long)totalVerifications, + (long)s_passCount, (long)s_failCount, (long)s_knownIssueCount); + fprintf(stderr, + "CDAC GC Stress: %ld frames examined " + "(%ld matched / %ld mismatched / %ld known-NIE)\n", + (long)s_frameTotal, (long)s_frameMatch, (long)s_frameMismatch, (long)s_frameKnownNie); STRESS_LOG3(LF_GCROOTS, LL_ALWAYS, "CDAC GC Stress shutdown: %d verifications (%d pass / %d fail)\n", - (int)actualVerifications, (int)s_verifyPass, (int)s_verifyFail); + (int)totalVerifications, (int)s_passCount, (int)s_failCount); if (s_logFile != nullptr) { fprintf(s_logFile, "\n=== Summary ===\n"); - fprintf(s_logFile, "Total stress points: %ld\n", (long)s_verifyCount); - fprintf(s_logFile, "Total verifications: %ld\n", (long)actualVerifications); - fprintf(s_logFile, " Passed: %ld\n", (long)s_verifyPass); - fprintf(s_logFile, " Failed: %ld\n", (long)s_verifyFail); - fprintf(s_logFile, " Skipped: %ld\n", (long)s_verifySkip); + fprintf(s_logFile, "Total verifications: %ld\n", (long)totalVerifications); + fprintf(s_logFile, " Passed: %ld\n", (long)s_passCount); + fprintf(s_logFile, " Failed: %ld\n", (long)s_failCount); + fprintf(s_logFile, " Known issues: %ld\n", (long)s_knownIssueCount); + fprintf(s_logFile, "Frames examined: %ld\n", (long)s_frameTotal); + fprintf(s_logFile, " Matched: %ld\n", (long)s_frameMatch); + fprintf(s_logFile, " Mismatched: %ld\n", (long)s_frameMismatch); + fprintf(s_logFile, " Known NIE: %ld\n", (long)s_frameKnownNie); fclose(s_logFile); s_logFile = nullptr; } @@ -521,34 +577,38 @@ void CdacStress::Shutdown() s_cdacHandle = 0; } - // Legacy DAC cleanup - if (s_dacSosDac != nullptr) { s_dacSosDac->Release(); s_dacSosDac = nullptr; } - if (s_dacProcess != nullptr) { s_dacProcess->Release(); s_dacProcess = nullptr; } - - if (s_seenStacks != nullptr) - { - delete s_seenStacks; - s_seenStacks = nullptr; - } - s_initialized = false; + s_cdacStressLevel = 0; LOG((LF_GCROOTS, LL_INFO10, "CDAC GC Stress: Shutdown complete\n")); } +//----------------------------------------------------------------------------- +// Trigger gates -- one specialization per cdac_trigger_points value. +// +// IsEnabled is also gated on s_initialized so the patch-installing call sites +//----------------------------------------------------------------------------- + +bool CdacStress::IsEnabled() +{ + return s_initialized && (s_cdacStressLevel & CDACSTRESS_ALLOC) != 0; +} + + //----------------------------------------------------------------------------- // Collect stack refs from the cDAC //----------------------------------------------------------------------------- -static bool CollectStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray* pRefs) +static HRESULT CollectCdacStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray* pRefs) { if (pSosDac == nullptr) - return false; + return E_POINTER; ISOSStackRefEnum* pEnum = nullptr; HRESULT hr = pSosDac->GetStackReferences(osThreadId, &pEnum); - - if (FAILED(hr) || pEnum == nullptr) - return false; + if (FAILED(hr)) + return hr; + if (pEnum == nullptr) + return E_POINTER; SOSStackRefData refData; unsigned int fetched = 0; @@ -564,14 +624,15 @@ static bool CollectStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray ref.Flags = refData.Flags; ref.Source = refData.Source; ref.SourceType = refData.SourceType; - ref.Register = refData.Register; + ref.Register = refData.HasRegisterInformation ? (int)refData.Register : -1; ref.Offset = refData.Offset; ref.StackPointer = refData.StackPointer; + ref.Side = SIDE_CDAC; pRefs->Append(ref); } pEnum->Release(); - return true; + return S_OK; } //----------------------------------------------------------------------------- @@ -580,9 +641,24 @@ static bool CollectStackRefs(ISOSDacInterface* pSosDac, DWORD osThreadId, SArray struct RuntimeRefCollectionContext { - StackRef refs[MAX_COLLECTED_REFS]; - int count; + SArray* refs; // caller-owned, appended during walk bool overflow; + + // Per-frame attribution: updated by the crawl callback before each + // EnumGcRefs/GcScanRoots call so the inner promote callback can stamp + // every ref with the producing frame. + // + // Convention matches DAC (DacStackReferenceWalker, daccess.cpp:7488-7498) + // and cDAC (GcScanContext.cs:89-97): + // - Frameless JIT frame: Source = native PC at the safepoint, SourceType = 0 (IP) + // - Explicit Frame: Source = Frame*, SourceType = 1 (Frame) + CLRDATA_ADDRESS currentFrameSource; + int currentFrameSourceType; + + // REGDISPLAY for the current frame (frameless only). Used by the promote + // callback to invert GetRegisterSlot and recover the register number for + // register-resident refs. nullptr for explicit Frames. + REGDISPLAY* currentRegDisplay; }; static void CollectRuntimeRefsPromoteFunc(PTR_PTR_Object ppObj, ScanContext* sc, uint32_t flags) @@ -590,19 +666,10 @@ static void CollectRuntimeRefsPromoteFunc(PTR_PTR_Object ppObj, ScanContext* sc, RuntimeRefCollectionContext* ctx = reinterpret_cast(sc->_unused1); if (ctx == nullptr) return; - if (ctx->count >= MAX_COLLECTED_REFS) - { - ctx->overflow = true; + if (ctx->overflow) return; - } - StackRef& ref = ctx->refs[ctx->count++]; - - // Always report the real ppObj address. For register-based refs, ppObj points - // into the REGDISPLAY/CONTEXT on the native stack — we can't reliably distinguish - // these from managed stack slots on the runtime side. The comparison logic handles - // this by matching register refs (cDAC Address=0) by (Object, Flags) only. - ref.Address = reinterpret_cast(ppObj); + StackRef ref; ref.Object = reinterpret_cast(*ppObj); ref.Flags = 0; @@ -610,15 +677,52 @@ static void CollectRuntimeRefsPromoteFunc(PTR_PTR_Object ppObj, ScanContext* sc, ref.Flags |= SOSRefInterior; if (flags & GC_CALL_PINNED) ref.Flags |= SOSRefPinned; - ref.Source = 0; - ref.SourceType = 0; + + // Per-frame attribution from the enclosing crawl callback. + ref.Source = ctx->currentFrameSource; + ref.SourceType = ctx->currentFrameSourceType; + + int recoveredReg = IdentifyRegisterFromPpObj(ctx->currentRegDisplay, (void*)ppObj); + if (recoveredReg >= 0) + { + ref.Address = 0; + ref.Register = recoveredReg; + } + else + { + ref.Address = reinterpret_cast(ppObj); + ref.Register = -1; + } + ref.Offset = 0; + ref.StackPointer = 0; + ref.Side = SIDE_RT; + + // SArray::Append can throw OutOfMemoryException. The runtime stack walker + // CONTRACTs NOTHROW so we must not let an exception escape this callback. + // On OOM, set the overflow flag so the caller treats the run as FAIL. + EX_TRY + { + ctx->refs->Append(ref); + } + EX_CATCH + { + ctx->overflow = true; + } + EX_END_CATCH } -static bool CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, StackRef* outRefs, int* outCount) +// Runs the runtime's own ScanStackRoots-equivalent walk and Appends the +// resulting refs into the caller's SArray. Returns S_OK on a clean walk, +// or S_FALSE if any Append failed (OOM); the SArray will be non-empty but +// missing the tail in that case. +static HRESULT CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, SArray* outRefs) { RuntimeRefCollectionContext collectCtx; - collectCtx.count = 0; + collectCtx.refs = outRefs; collectCtx.overflow = false; + collectCtx.currentFrameSource = 0; + collectCtx.currentFrameSourceType = 0; + collectCtx.currentRegDisplay = nullptr; GCCONTEXT gcctx = {}; @@ -661,6 +765,7 @@ static bool CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, StackRef* ou { DiagContext* dCtx = (DiagContext*)pData; GCCONTEXT* gcctx = dCtx->gcctx; + RuntimeRefCollectionContext* collectCtx = dCtx->collectCtx; ResetPointerHolder rph(&gcctx->cf); gcctx->cf = pCF; @@ -671,6 +776,13 @@ static bool CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, StackRef* ou { if (pCF->IsFrameless()) { + // Frameless JIT frame: attribute refs to the native PC at the + // safepoint (matches DAC SOS_StackSourceIP convention). + collectCtx->currentFrameSource = + (CLRDATA_ADDRESS)PCODEToPINSTR(GetControlPC(pCF->GetRegisterSet())); + collectCtx->currentFrameSourceType = 0; // SOS_StackSourceIP + collectCtx->currentRegDisplay = pCF->GetRegisterSet(); + ICodeManager* pCM = pCF->GetCodeManager(); _ASSERTE(pCM != NULL); unsigned flags = pCF->GetCodeManagerFlags(); @@ -679,10 +791,18 @@ static bool CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, StackRef* ou flags, GcEnumObject, gcctx); + + collectCtx->currentRegDisplay = nullptr; } else { + // Explicit Frame: attribute refs to the Frame address (matches + // DAC SOS_StackSourceFrame convention). Explicit Frames don't + // emit register-resident refs, so leave currentRegDisplay null. Frame* pFrame = pCF->GetFrame(); + collectCtx->currentFrameSource = (CLRDATA_ADDRESS)dac_cast(pFrame); + collectCtx->currentFrameSourceType = 1; // SOS_StackSourceFrame + pFrame->GcScanRoots(gcctx->f, gcctx->sc); } } @@ -697,10 +817,7 @@ static bool CollectRuntimeStackRefs(Thread* pThread, PCONTEXT regs, StackRef* ou // does NOT include those. We intentionally omit GCFrame scanning here so our // runtime-side collection matches what the cDAC is expected to produce. - // Copy results out - *outCount = collectCtx.count; - memcpy(outRefs, collectCtx.refs, collectCtx.count * sizeof(StackRef)); - return !collectCtx.overflow; + return collectCtx.overflow ? S_FALSE : S_OK; } //----------------------------------------------------------------------------- @@ -727,50 +844,6 @@ static int FilterInteriorStackRefs(StackRef* refs, int count, Thread* pThread, u return writeIdx; } -//----------------------------------------------------------------------------- -// Deduplicate cDAC refs that have the same (Address, Object, Flags). -// The cDAC may walk the same managed frame at two different offsets due to -// Frames restoring context (e.g. InlinedCallFrame). The same stack slots -// get reported from both offsets. The runtime only walks each frame once, -// so we deduplicate to match. -//----------------------------------------------------------------------------- - -static int __cdecl CompareStackRefKey(const void* a, const void* b) -{ - const StackRef* refA = static_cast(a); - const StackRef* refB = static_cast(b); - if (refA->Address != refB->Address) - return (refA->Address < refB->Address) ? -1 : 1; - if (refA->Object != refB->Object) - return (refA->Object < refB->Object) ? -1 : 1; - if (refA->Flags != refB->Flags) - return (refA->Flags < refB->Flags) ? -1 : 1; - return 0; -} - -static int DeduplicateRefs(StackRef* refs, int count) -{ - if (count <= 1) - return count; - qsort(refs, count, sizeof(StackRef), CompareStackRefKey); - int writeIdx = 1; - for (int i = 1; i < count; i++) - { - // Only dedup stack-based refs (Address != 0). - // Register refs (Address == 0) are legitimately different entries - // even when Address/Object/Flags match (different registers). - if (refs[i].Address != 0 && - refs[i].Address == refs[i-1].Address && - refs[i].Object == refs[i-1].Object && - refs[i].Flags == refs[i-1].Flags) - { - continue; - } - refs[writeIdx++] = refs[i]; - } - return writeIdx; -} - //----------------------------------------------------------------------------- // Report mismatch //----------------------------------------------------------------------------- @@ -787,359 +860,433 @@ static void ReportMismatch(const char* message, Thread* pThread, PCONTEXT regs) } //----------------------------------------------------------------------------- -// Compare IXCLRDataStackWalk frame-by-frame between cDAC and legacy DAC. -// Creates a stack walk on each, advances in lockstep, and compares -// GetContext + Request(FRAME_DATA) at each step. +// FrameRefGroup helpers used by CompareFrames below. //----------------------------------------------------------------------------- -static void CompareStackWalks(Thread* pThread, PCONTEXT regs) +// Represents a group of refs from the same Source (managed frame or explicit Frame). +struct FrameRefGroup { - if (s_cdacProcess == nullptr || s_dacProcess == nullptr) - return; - - DWORD osThreadId = pThread->GetOSThreadId(); - - // Get IXCLRDataTask for the thread from both processes - IXCLRDataTask* cdacTask = nullptr; - IXCLRDataTask* dacTask = nullptr; + CLRDATA_ADDRESS Source; + int SourceType; // 0 = IP, 1 = Frame + int StartIdx; // Index into the original ref array + int Count; // Number of refs in this group +}; - HRESULT hr1 = s_cdacProcess->GetTaskByOSThreadID(osThreadId, &cdacTask); - HRESULT hr2 = s_dacProcess->GetTaskByOSThreadID(osThreadId, &dacTask); +// Build a sorted list of unique Sources with their ref index ranges. +// The refs array is sorted by Source as a side effect. +static int __cdecl CompareBySource(const void* a, const void* b) +{ + const StackRef* ra = static_cast(a); + const StackRef* rb = static_cast(b); + if (ra->Source != rb->Source) + return (ra->Source < rb->Source) ? -1 : 1; + return 0; +} - if (FAILED(hr1) || cdacTask == nullptr || FAILED(hr2) || dacTask == nullptr) - { - if (cdacTask) cdacTask->Release(); - if (dacTask) dacTask->Release(); +static void GroupRefsByFrame(StackRef* refs, int count, SArray* groups) +{ + if (count == 0) return; - } - // Create stack walks - IXCLRDataStackWalk* cdacWalk = nullptr; - IXCLRDataStackWalk* dacWalk = nullptr; + qsort(refs, count, sizeof(StackRef), CompareBySource); - hr1 = cdacTask->CreateStackWalk(0xF /* CLRDATA_SIMPFRAME_MANAGED_METHOD | ... */, &cdacWalk); - hr2 = dacTask->CreateStackWalk(0xF, &dacWalk); + CLRDATA_ADDRESS currentSource = refs[0].Source; + int startIdx = 0; - cdacTask->Release(); - dacTask->Release(); - - if (FAILED(hr1) || cdacWalk == nullptr || FAILED(hr2) || dacWalk == nullptr) + for (int i = 1; i <= count; i++) { - if (cdacWalk) cdacWalk->Release(); - if (dacWalk) dacWalk->Release(); - return; - } - - // Walk in lockstep comparing each frame - int frameIdx = 0; - bool mismatch = false; - while (frameIdx < 200) // safety limit - { - // Compare GetContext - BYTE cdacCtx[4096] = {}; - BYTE dacCtx[4096] = {}; - ULONG32 cdacCtxSize = 0, dacCtxSize = 0; - - hr1 = cdacWalk->GetContext(0, sizeof(cdacCtx), &cdacCtxSize, cdacCtx); - hr2 = dacWalk->GetContext(0, sizeof(dacCtx), &dacCtxSize, dacCtx); - - if (hr1 != hr2) + if (i == count || refs[i].Source != currentSource) { - if (s_logFile) - fprintf(s_logFile, " [WALK_MISMATCH] Frame %d: GetContext hr mismatch cDAC=0x%x DAC=0x%x\n", - frameIdx, hr1, hr2); - mismatch = true; - break; - } - if (hr1 != S_OK) - break; // both finished - - if (cdacCtxSize != dacCtxSize) - { - if (s_logFile) - fprintf(s_logFile, " [WALK_MISMATCH] Frame %d: Context size differs cDAC=%u DAC=%u\n", - frameIdx, cdacCtxSize, dacCtxSize); - mismatch = true; - } - else if (cdacCtxSize >= sizeof(CONTEXT)) - { - // Compare IP and SP — these are what matter for stack walk parity. - // Other CONTEXT fields (floating-point, debug registers, xstate) may - // differ between cDAC and DAC without affecting the walk. - PCODE cdacIP = GetIP((CONTEXT*)cdacCtx); - PCODE dacIP = GetIP((CONTEXT*)dacCtx); - TADDR cdacSP = GetSP((CONTEXT*)cdacCtx); - TADDR dacSP = GetSP((CONTEXT*)dacCtx); - - if (cdacIP != dacIP || cdacSP != dacSP) + FrameRefGroup g; + g.Source = currentSource; + g.SourceType = refs[startIdx].SourceType; + g.StartIdx = startIdx; + g.Count = i - startIdx; + groups->Append(g); + if (i < count) { - if (s_logFile) - fprintf(s_logFile, " [WALK_MISMATCH] Frame %d: Context differs cDAC_IP=0x%llx cDAC_SP=0x%llx DAC_IP=0x%llx DAC_SP=0x%llx\n", - frameIdx, - (unsigned long long)cdacIP, (unsigned long long)cdacSP, - (unsigned long long)dacIP, (unsigned long long)dacSP); - mismatch = true; + currentSource = refs[i].Source; + startIdx = i; } } + } +} - // Compare Request(FRAME_DATA) - ULONG64 cdacFrameAddr = 0, dacFrameAddr = 0; - hr1 = cdacWalk->Request(0xf0000000, 0, nullptr, sizeof(cdacFrameAddr), (BYTE*)&cdacFrameAddr); - hr2 = dacWalk->Request(0xf0000000, 0, nullptr, sizeof(dacFrameAddr), (BYTE*)&dacFrameAddr); - - if (hr1 == S_OK && hr2 == S_OK && cdacFrameAddr != dacFrameAddr) +// Compare refs within a single frame using exact matching on a canonical key. +// Returns the number of unmatched refs in each set. +// +// Canonical key per ref: +// - Address == 0 -> register-resident ref. Key = (Register, Object, Flags). +// cDAC reports Address=0 + Register set; runtime mirrors +// this convention by clearing Address=0 when +// IdentifyRegisterFromPpObj recovers a register number. +// - Address != 0 -> stack-slot ref. Key = (Address, Object, Flags). +// Register/Offset are metadata describing how the JIT +// addressed the slot (e.g. [rbp-0x10]) and are NOT +// part of the matching key. +// +// A ref is considered matched iff it has an unused partner with an identical +// canonical key. There is no fuzzy fallback - any unmatched ref is a real +// disagreement that needs a clear diagnostic. +static void CompareFrameRefs(StackRef* refsA, int countA, StackRef* refsB, int countB, + int* unmatchedA, int* unmatchedB, + bool* aUsed, bool* bUsed) +{ + for (int i = 0; i < countA; i++) + { + bool aIsReg = refsA[i].Address == 0; + for (int j = 0; j < countB; j++) { - if (s_logFile) + if (bUsed[j]) continue; + bool bIsReg = refsB[j].Address == 0; + if (aIsReg != bIsReg) continue; + if (refsA[i].Object != refsB[j].Object) continue; + if (refsA[i].Flags != refsB[j].Flags) continue; + if (aIsReg) { - PCODE cdacIP = 0, dacIP = 0; - if (cdacCtxSize >= sizeof(CONTEXT)) - cdacIP = GetIP((CONTEXT*)cdacCtx); - if (dacCtxSize >= sizeof(CONTEXT)) - dacIP = GetIP((CONTEXT*)dacCtx); - fprintf(s_logFile, " [WALK_MISMATCH] Frame %d: FrameAddr cDAC=0x%llx DAC=0x%llx (cDAC_IP=0x%llx DAC_IP=0x%llx)\n", - frameIdx, (unsigned long long)cdacFrameAddr, (unsigned long long)dacFrameAddr, - (unsigned long long)cdacIP, (unsigned long long)dacIP); + if (refsA[i].Register != refsB[j].Register) continue; } - mismatch = true; - } - - // Advance both - hr1 = cdacWalk->Next(); - hr2 = dacWalk->Next(); - - if (hr1 != hr2) - { - if (s_logFile) - fprintf(s_logFile, " [WALK_MISMATCH] Frame %d: Next hr mismatch cDAC=0x%x DAC=0x%x\n", - frameIdx, hr1, hr2); - mismatch = true; + else + { + if (refsA[i].Address != refsB[j].Address) continue; + } + aUsed[i] = bUsed[j] = true; break; } - if (hr1 != S_OK) - break; // both finished - - frameIdx++; } - if (!mismatch && s_logFile) - fprintf(s_logFile, " [WALK_OK] %d frames matched between cDAC and DAC\n", frameIdx); - - cdacWalk->Release(); - dacWalk->Release(); + *unmatchedA = 0; + *unmatchedB = 0; + for (int i = 0; i < countA; i++) + if (!aUsed[i]) (*unmatchedA)++; + for (int j = 0; j < countB; j++) + if (!bUsed[j]) (*unmatchedB)++; } //----------------------------------------------------------------------------- -//----------------------------------------------------------------------------- -// Compare two ref sets using two-phase matching. -// Phase 1: Match stack refs (Address != 0) by exact (Address, Object, Flags). -// Phase 2: Match register refs (Address == 0) by (Object, Flags) only. -// Returns true if all refs in setA have a match in setB and counts are equal. +// Per-frame comparison. +// +// Group both ref sets by Source (managed PC for frameless JIT frames, +// Frame* for explicit Frames), merge-walk the two grouped lists, and per +// matching frame compare refs with CompareFrameRefs. Captures full per-frame +// results (including per-ref dispositions in a shared SArray) into the +// caller-owned FrameResult SArray. Pure data transform: no I/O, no counter +// side effects. +// +// Mismatch classification (runtime is the oracle): +// - cDAC-only frame: MISMATCH +// - RT-only frame, Source deferred: KNOWN_NIE +// - RT-only frame, not deferred: MISMATCH +// - Same Source, refs don't match: MISMATCH //----------------------------------------------------------------------------- -static bool CompareRefSets(StackRef* refsA, int countA, StackRef* refsB, int countB) +struct CompareVerdict { - if (countA != countB) - return false; - if (countA == 0) - return true; - if (countA > MAX_COLLECTED_REFS) - return false; + bool pass; // every frame's refs matched (no mismatches at all) + bool allKnown; // !pass, but every mismatching frame is a deferred Source +}; - bool matched[MAX_COLLECTED_REFS] = {}; +enum FrameOutcome : unsigned char +{ + FRAME_OUTCOME_MATCH = 0, // both sides emitted this frame, all refs matched + FRAME_OUTCOME_MISMATCH = 1, // real disagreement (ref-level or frame-only) + FRAME_OUTCOME_KNOWN_NIE = 2, // RT-only frame, Source on cDAC deferred list +}; - for (int i = 0; i < countA; i++) +// Result of comparing one frame. Carries enough state for the renderer to +// reconstruct the whole frame (counts, SPs, disposition of each ref) without +// re-walking the comparison. +// +// Per-ref dispositions are stored in a shared SArray owned +// by the caller of CompareFrames. CdacDispStart/RtDispStart index into that +// buffer; the disposition for ref i on the cDAC side is at +// dispBuf[CdacDispStart + i]. Storing them out-of-band keeps FrameResult +// trivially copyable (SArray::Append memcpys it) and avoids +// any per-frame ref count cap. +struct FrameResult +{ + CLRDATA_ADDRESS Source; + int SourceType; + + CLRDATA_ADDRESS SP_cdac; // 0 if cDAC didn't have this frame + CLRDATA_ADDRESS SP_rt; // 0 if RT didn't have this frame + + int CdacStart; // Index into the original cDAC ref array + int CdacCount; // Refs cDAC reported for this frame + int RtStart; // Index into the original RT ref array + int RtCount; // Refs RT reported for this frame + + int CdacDispStart; // Index into shared disp buffer + int RtDispStart; // Index into shared disp buffer + + FrameOutcome Outcome; +}; + +// Helper used by CompareFrames to append per-ref dispositions into the shared +// disposition buffer. Returns the start index where the dispositions were +// written, suitable for storing in FrameResult::*DispStart. +static int AppendDispositions(SArray* dispBuf, + const bool* used, int count, bool isNie) +{ + int start = (int)dispBuf->GetCount(); + for (int i = 0; i < count; i++) { - if (refsA[i].Address == 0) - continue; - bool found = false; - for (int j = 0; j < countB; j++) - { - if (matched[j]) continue; - if (refsA[i].Address == refsB[j].Address && - refsA[i].Object == refsB[j].Object && - refsA[i].Flags == refsB[j].Flags) - { - matched[j] = true; - found = true; - break; - } - } - if (!found) return false; + RefDisposition d; + if (used[i]) d = REF_MATCHED; + else if (isNie) d = REF_NIE; + else d = REF_ONLY; + dispBuf->Append(d); } + return start; +} - for (int i = 0; i < countA; i++) +// Walks the grouped frames once and appends per-frame results into outResults. +// Both ref arrays must be in source-order (qsort'd in GroupRefsByFrame). +// Per-ref dispositions are appended into the caller-owned dispBuf SArray; +// FrameResult::CdacDispStart / RtDispStart index back into dispBuf, and +// CdacStart / RtStart index back into refsCdac / refsRt for the underlying +// ref data. +static int CompareFrames( + StackRef* refsCdac, int countCdac, + StackRef* refsRt, int countRt, + const CLRDATA_ADDRESS* deferred, int deferredCount, + SArray* outResults, + SArray* dispBuf) +{ + SArray groupsCdac, groupsRt; + GroupRefsByFrame(refsCdac, countCdac, &groupsCdac); + GroupRefsByFrame(refsRt, countRt, &groupsRt); + + int numGroupsCdac = (int)groupsCdac.GetCount(); + int numGroupsRt = (int)groupsRt.GetCount(); + + int idxCdac = 0, idxRt = 0; + int resultCount = 0; + + auto addResult = [&]() -> FrameResult* { + FrameResult fr; + memset(&fr, 0, sizeof(fr)); + outResults->Append(fr); + resultCount++; + return &(*outResults)[outResults->GetCount() - 1]; + }; + + while (idxCdac < numGroupsCdac || idxRt < numGroupsRt) { - if (refsA[i].Address != 0) - continue; - bool found = false; - for (int j = 0; j < countB; j++) + bool bothHave = idxCdac < numGroupsCdac && idxRt < numGroupsRt + && groupsCdac[idxCdac].Source == groupsRt[idxRt].Source; + bool cdacOnly = idxRt >= numGroupsRt + || (idxCdac < numGroupsCdac + && groupsCdac[idxCdac].Source < groupsRt[idxRt].Source); + + FrameResult* fr = addResult(); + + if (bothHave) { - if (matched[j]) continue; - if (refsA[i].Object == refsB[j].Object && - refsA[i].Flags == refsB[j].Flags) - { - matched[j] = true; - found = true; - break; - } + FrameRefGroup& gC = groupsCdac[idxCdac]; + FrameRefGroup& gR = groupsRt[idxRt]; + + int cC = gC.Count; + int cR = gR.Count; + NewArrayHolder cUsed(new bool[cC]()); + NewArrayHolder rUsed(new bool[cR]()); + + int unmatchedA = 0, unmatchedB = 0; + CompareFrameRefs(&refsCdac[gC.StartIdx], cC, + &refsRt[gR.StartIdx], cR, + &unmatchedA, &unmatchedB, cUsed, rUsed); + + fr->Source = gC.Source; + fr->SourceType = gC.SourceType; + fr->SP_cdac = refsCdac[gC.StartIdx].StackPointer; + fr->SP_rt = refsRt[gR.StartIdx].StackPointer; + fr->CdacStart = gC.StartIdx; + fr->CdacCount = cC; + fr->RtStart = gR.StartIdx; + fr->RtCount = cR; + fr->Outcome = (unmatchedA > 0 || unmatchedB > 0) + ? FRAME_OUTCOME_MISMATCH + : FRAME_OUTCOME_MATCH; + fr->CdacDispStart = AppendDispositions(dispBuf, cUsed, cC, /*isNie=*/false); + fr->RtDispStart = AppendDispositions(dispBuf, rUsed, cR, /*isNie=*/false); + idxCdac++; + idxRt++; + } + else if (cdacOnly) + { + FrameRefGroup& gC = groupsCdac[idxCdac]; + fr->Source = gC.Source; + fr->SourceType = gC.SourceType; + fr->SP_cdac = refsCdac[gC.StartIdx].StackPointer; + fr->SP_rt = 0; + fr->CdacStart = gC.StartIdx; + fr->CdacCount = gC.Count; + fr->RtStart = 0; + fr->RtCount = 0; + fr->Outcome = FRAME_OUTCOME_MISMATCH; + fr->CdacDispStart = (int)dispBuf->GetCount(); + for (int i = 0; i < gC.Count; i++) dispBuf->Append(REF_ONLY); + fr->RtDispStart = (int)dispBuf->GetCount(); + idxCdac++; + } + else + { + // Frame only in RT. KNOWN_NIE iff Source is on the deferred list. + FrameRefGroup& gR = groupsRt[idxRt]; + bool isKnownNie = IsDeferredFrame(gR.Source, deferred, deferredCount); + fr->Source = gR.Source; + fr->SourceType = gR.SourceType; + fr->SP_cdac = 0; + fr->SP_rt = refsRt[gR.StartIdx].StackPointer; + fr->CdacStart = 0; + fr->CdacCount = 0; + fr->RtStart = gR.StartIdx; + fr->RtCount = gR.Count; + fr->Outcome = isKnownNie ? FRAME_OUTCOME_KNOWN_NIE + : FRAME_OUTCOME_MISMATCH; + fr->CdacDispStart = (int)dispBuf->GetCount(); + fr->RtDispStart = (int)dispBuf->GetCount(); + RefDisposition d = isKnownNie ? REF_NIE : REF_ONLY; + for (int i = 0; i < gR.Count; i++) dispBuf->Append(d); + idxRt++; } - if (!found) return false; } - return true; + return resultCount; } -//----------------------------------------------------------------------------- -// Filter interior stack pointers and deduplicate a ref set in place. -//----------------------------------------------------------------------------- - -static int FilterAndDedup(StackRef* refs, int count, Thread* pThread, uintptr_t stackLimit) +// Walks FrameResult[] once and derives the verdict + advances global frame +// counters. Counters are bumped exactly once per call. +static CompareVerdict ComputeVerdict(const FrameResult* frames, int frameCount) { - count = FilterInteriorStackRefs(refs, count, pThread, stackLimit); - count = DeduplicateRefs(refs, count); - return count; + int trueDiff = 0, knownDiff = 0; + for (int i = 0; i < frameCount; i++) + { + InterlockedIncrement(&s_frameTotal); + switch (frames[i].Outcome) + { + case FRAME_OUTCOME_MATCH: + InterlockedIncrement(&s_frameMatch); + break; + case FRAME_OUTCOME_MISMATCH: + InterlockedIncrement(&s_frameMismatch); + trueDiff++; + break; + case FRAME_OUTCOME_KNOWN_NIE: + InterlockedIncrement(&s_frameKnownNie); + knownDiff++; + break; + } + } + CompareVerdict v; + v.pass = (trueDiff == 0 && knownDiff == 0); + v.allKnown = (trueDiff == 0 && knownDiff > 0); + return v; } -//----------------------------------------------------------------------------- -// Main entry point: verify at a GC stress point -//----------------------------------------------------------------------------- - -bool CdacStress::ShouldSkipStressPoint() +// Extract CDAC_DEFERRED_FRAME sentinel entries from a cDAC ref array. +// Removes them in-place (shifting later elements down), writes their Source +// addresses into `deferredOut`, and returns the new ref count. Sentinels are +// emitted by GcScanContext.RecordDeferredFrame for explicit Frames whose cDAC +// scan path is not implemented yet (typically PromoteCallerStack pending the +// ArgIterator port). +static int ExtractDeferredFrames( + StackRef* refs, int count, + CLRDATA_ADDRESS* deferredOut, int* pDeferredCount, int deferredMax) { - LONG count = InterlockedIncrement(&s_verifyCount); - - if (s_step <= 1) - return false; - - return (count % s_step) != 0; + int dst = 0; + int deferred = 0; + for (int i = 0; i < count; i++) + { + if ((refs[i].Flags & CDAC_DEFERRED_FRAME) != 0) + { + if (deferred < deferredMax) + deferredOut[deferred++] = refs[i].Source; + continue; + } + if (dst != i) + refs[dst] = refs[i]; + dst++; + } + *pDeferredCount = deferred; + return dst; } -void CdacStress::VerifyAtAllocPoint() +static bool IsDeferredFrame(CLRDATA_ADDRESS source, const CLRDATA_ADDRESS* deferred, int deferredCount) { - if (!s_initialized) - return; - - // Reentrancy guard: allocations inside VerifyAtStressPoint (e.g., SArray) - // would trigger this function again, causing deadlock on s_cdacLock. - if (t_inVerification) - return; - - Thread* pThread = GetThreadNULLOk(); - if (pThread == nullptr || !pThread->PreemptiveGCDisabled()) - return; - - CONTEXT ctx; - RtlCaptureContext(&ctx); - VerifyAtStressPoint(pThread, &ctx); + for (int i = 0; i < deferredCount; i++) + { + if (deferred[i] == source) + return true; + } + return false; } -void CdacStress::VerifyAtStressPoint(Thread* pThread, PCONTEXT regs) +//----------------------------------------------------------------------------- +// Stress verification implementation: shared by all trigger-point +// specializations below. Compares cDAC vs runtime stack refs at the captured +// CONTEXT and records per-frame results. +//----------------------------------------------------------------------------- + +static void VerifyAtStressPoint(Thread* pThread, PCONTEXT regs) { _ASSERTE(s_initialized); _ASSERTE(pThread != nullptr); _ASSERTE(regs != nullptr); - // RAII guard: set t_inVerification=true on entry, false on exit. - // Prevents infinite recursion when allocations inside this function - // trigger VerifyAtAllocPoint again (which would deadlock on s_cdacLock). - struct ReentrancyGuard { - ReentrancyGuard() { t_inVerification = true; } - ~ReentrancyGuard() { t_inVerification = false; } - } reentrancyGuard; - // Serialize cDAC access — the cDAC's ProcessedData cache and COM interfaces // are not thread-safe, and GC stress can fire on multiple threads. CrstHolder cdacLock(&s_cdacLock); - // Unique-stack filtering: use IP + SP as a stack identity. - // This skips re-verification at the same code location with the same stack depth. - if (IsUniqueEnabled() && s_seenStacks != nullptr) - { - SIZE_T stackHash = GetIP(regs) ^ (GetSP(regs) * 2654435761u); - if (s_seenStacks->LookupPtr(stackHash) != nullptr) - return; - s_seenStacks->Add(stackHash); - } + DWORD osThreadId = pThread->GetOSThreadId(); - // Set the thread context for the cDAC's ReadThreadContext callback. - s_currentContext = regs; - s_currentThreadId = pThread->GetOSThreadId(); + // Phase A: Collect raw refs from both sides (independent walks). - // Flush the cDAC's ProcessedData cache so it re-reads from the live process. - if (s_cdacProcess != nullptr) + // A.1: cDAC side. ReadThreadContext callback state is wired here so the + // cDAC can return the captured CONTEXT for the active thread. + SArray cdacRefs; + HRESULT cdacHr; { - s_cdacProcess->Flush(); - } + s_currentContext = regs; + s_currentThreadId = osThreadId; - // Flush the legacy DAC cache too. - if (s_dacProcess != nullptr) - { - s_dacProcess->Flush(); - } + // Flush only target-state caches (process state can change), keep + // immutable metadata caches (e.g. CoreLib type info) populated. + if (s_cdacProcess != nullptr) + s_cdacProcess->Request(DACSTRESSPRIV_REQUEST_FLUSH_TARGET_STATE, 0, NULL, 0, NULL); - // Compare IXCLRDataStackWalk frame-by-frame between cDAC and legacy DAC. - if (s_cdacStressLevel & CDACSTRESS_WALK) - { - CompareStackWalks(pThread, regs); - } + cdacHr = CollectCdacStackRefs(s_cdacSosDac, osThreadId, &cdacRefs); - // Compare GC stack references. - if (!(s_cdacStressLevel & CDACSTRESS_REFS)) - { s_currentContext = nullptr; s_currentThreadId = 0; - return; } - // Step 1: Collect raw refs from cDAC (always) and DAC (if USE_DAC). - DWORD osThreadId = pThread->GetOSThreadId(); - - SArray cdacRefs; - bool haveCdac = CollectStackRefs(s_cdacSosDac, osThreadId, &cdacRefs); + // A.2: Runtime side -- the oracle (GC's own ScanStackRoots-equivalent walk). + SArray runtimeRefs; + HRESULT rtHr = CollectRuntimeStackRefs(pThread, regs, &runtimeRefs); + int runtimeCount = (int)runtimeRefs.GetCount(); - SArray dacRefs; - bool haveDac = false; - if (s_cdacStressLevel & CDACSTRESS_USE_DAC) + if (FAILED(cdacHr)) { - haveDac = (s_dacSosDac != nullptr) && CollectStackRefs(s_dacSosDac, osThreadId, &dacRefs); + InterlockedIncrement(&s_failCount); + CDAC_LOG("[FAIL] Thread=0x%x IP=0x%p - cDAC GetStackReferences failed (hr=0x%08x)\n", + osThreadId, (void*)GetIP(regs), cdacHr); + return; } - - s_currentContext = nullptr; - s_currentThreadId = 0; - - StackRef runtimeRefsBuf[MAX_COLLECTED_REFS]; - int runtimeCount = 0; - bool haveRuntime = CollectRuntimeStackRefs(pThread, regs, runtimeRefsBuf, &runtimeCount); - - if (!haveCdac || !haveRuntime) + if (rtHr == S_FALSE) { - InterlockedIncrement(&s_verifySkip); - if (s_logFile != nullptr) - { - if (!haveCdac) - fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - cDAC GetStackReferences failed\n", - osThreadId, (void*)GetIP(regs)); - else - fprintf(s_logFile, "[SKIP] Thread=0x%x IP=0x%p - runtime CollectRuntimeStackRefs overflowed\n", - osThreadId, (void*)GetIP(regs)); - } + // OOM mid-Append; comparing a truncated set risks a false PASS. + InterlockedIncrement(&s_failCount); + CDAC_LOG("[FAIL] Thread=0x%x IP=0x%p - RT collection OOM after %d refs\n", + osThreadId, (void*)GetIP(regs), runtimeCount); return; } - // Step 2: Compare cDAC vs DAC raw (before any filtering). - int rawCdacCount = (int)cdacRefs.GetCount(); - int rawDacCount = haveDac ? (int)dacRefs.GetCount() : -1; - bool dacMatch = true; - if (haveDac) - { - StackRef* cdacBuf = cdacRefs.OpenRawBuffer(); - StackRef* dacBuf = dacRefs.OpenRawBuffer(); - dacMatch = CompareRefSets(cdacBuf, rawCdacCount, dacBuf, rawDacCount); - cdacRefs.CloseRawBuffer(); - dacRefs.CloseRawBuffer(); - } + // Phase B: Normalize the cDAC side so it can compare directly with RT. - // Step 3: Filter cDAC refs and compare vs RT (always). + // B.1: Live-stack upper bound. PromoteCarefully (siginfo.cpp) drops + // interior pointers whose value lies in the live stack [topStack, ...). + // We mirror that filter on the cDAC side in B.3. Frame* pTopFrame = pThread->GetFrame(); Object** topStack = (Object**)pTopFrame; if (InlinedCallFrame::FrameHasActiveCall(pTopFrame)) @@ -1149,60 +1296,363 @@ void CdacStress::VerifyAtStressPoint(Thread* pThread, PCONTEXT regs) } uintptr_t stackLimit = (uintptr_t)topStack; - int filteredCdacCount = rawCdacCount; - if (filteredCdacCount > 0) + // B.2: Extract CDAC_DEFERRED_FRAME sentinels from the cDAC ref set. + // These are markers (not real refs) emitted when the cDAC intentionally + // skips a Frame whose scan path is not implemented yet. Their Source + // addresses are used in Phase C to re-classify diffs as known issues. + CLRDATA_ADDRESS deferredFrames[MAX_DEFERRED_FRAMES]; + int deferredFrameCount = 0; + int cdacCount = (int)cdacRefs.GetCount(); + if (cdacCount > 0) { - StackRef* cdacBuf = cdacRefs.OpenRawBuffer(); - filteredCdacCount = FilterAndDedup(cdacBuf, filteredCdacCount, pThread, stackLimit); + StackRef* buf = cdacRefs.OpenRawBuffer(); + cdacCount = ExtractDeferredFrames( + buf, cdacCount, + deferredFrames, &deferredFrameCount, MAX_DEFERRED_FRAMES); cdacRefs.CloseRawBuffer(); } - runtimeCount = DeduplicateRefs(runtimeRefsBuf, runtimeCount); + // B.3: Mirror PromoteCarefully's interior-into-stack filter on the cDAC + // side. The cDAC reports raw GcInfo slots without this filter. + if (cdacCount > 0) + { + StackRef* buf = cdacRefs.OpenRawBuffer(); + cdacCount = FilterInteriorStackRefs(buf, cdacCount, pThread, stackLimit); + cdacRefs.CloseRawBuffer(); + } + + // Phase C: Compare per-frame. CompareFrames is a pure data transform; + // ComputeVerdict bumps the global counters once. + SArray frameResults; + SArray dispBuf; StackRef* cdacBuf = cdacRefs.OpenRawBuffer(); - bool rtMatch = CompareRefSets(cdacBuf, filteredCdacCount, runtimeRefsBuf, runtimeCount); + StackRef* runtimeBuf = runtimeRefs.OpenRawBuffer(); + int frameCount = CompareFrames( + cdacBuf, cdacCount, + runtimeBuf, runtimeCount, + deferredFrames, deferredFrameCount, + &frameResults, &dispBuf); + const RefDisposition* dispPtr = dispBuf.GetElements(); + CompareVerdict verdict = ComputeVerdict(frameResults.GetElements(), frameCount); + + // Phase D: Bucket the outcome and (on mismatch) emit hierarchical + // diagnostics: one block per broken frame, then one stack trace. + if (verdict.pass) + InterlockedIncrement(&s_passCount); + else if (verdict.allKnown) + InterlockedIncrement(&s_knownIssueCount); + else + InterlockedIncrement(&s_failCount); + + if (verdict.pass) + { + CDAC_LOG("[PASS] Thread=0x%x IP=0x%p cDAC=%d RT=%d frames=%d\n", + osThreadId, (void*)GetIP(regs), cdacCount, runtimeCount, frameCount); + } + else if (s_logFile != nullptr) + { + const char* label = verdict.allKnown ? "KNOWN_ISSUE" : "FAIL"; + + // Per-trigger-point frame breakdown — lets a reader confirm at a + // glance that a KNOWN_ISSUE has zero real mismatch frames. + int fMatch = 0, fMismatch = 0, fNie = 0; + for (int i = 0; i < frameCount; i++) + { + switch (frameResults[i].Outcome) + { + case FRAME_OUTCOME_MATCH: fMatch++; break; + case FRAME_OUTCOME_MISMATCH: fMismatch++; break; + case FRAME_OUTCOME_KNOWN_NIE: fNie++; break; + } + } + + CDAC_LOG("[%s] Thread=0x%x IP=0x%p cDAC=%d RT=%d frames=%d (match=%d mismatch=%d known_nie=%d)\n", + label, osThreadId, (void*)GetIP(regs), cdacCount, runtimeCount, + frameCount, fMatch, fMismatch, fNie); + + bool verbose = IsCdacStressVerboseEnabled(); + + // Per-broken-frame blocks. Matched frames are omitted entirely in + // concise mode; verbose mode still emits matched refs under their + // [STACK_TRACE] entry. Frame numbering matches the stack trace + // emitted at the end. + for (int i = 0; i < frameCount; i++) + { + const FrameResult& fr = frameResults[i]; + if (fr.Outcome == FRAME_OUTCOME_MATCH) + continue; + + char methodName[256]; + ResolveMethodName(fr.Source, fr.SourceType, methodName, sizeof(methodName)); + + const char* outcomeName = + fr.Outcome == FRAME_OUTCOME_MISMATCH ? "MISMATCH" : + fr.Outcome == FRAME_OUTCOME_KNOWN_NIE ? "KNOWN_NIE" : "?"; + + const char* spNote = ""; + if (fr.SP_cdac != 0 && fr.SP_rt != 0 && fr.SP_cdac != fr.SP_rt) + spNote = " <-- SP MISMATCH"; + + CDAC_LOG(" Frame #%d %s [%s] cDAC=%d RT=%d SP_cDAC=0x%llx SP_RT=0x%llx%s\n", + i, methodName, outcomeName, fr.CdacCount, fr.RtCount, + (unsigned long long)fr.SP_cdac, (unsigned long long)fr.SP_rt, + spNote); + + // Per-ref dump. Verbose -> all refs; concise -> only non-MATCHED + // refs (which is the actionable signal — what diverges). + for (int j = 0; j < fr.CdacCount; j++) + { + RefDisposition d = dispPtr[fr.CdacDispStart + j]; + if (!verbose && d == REF_MATCHED) + continue; + LogRef(d, cdacBuf[fr.CdacStart + j]); + } + for (int j = 0; j < fr.RtCount; j++) + { + RefDisposition d = dispPtr[fr.RtDispStart + j]; + if (!verbose && d == REF_MATCHED) + continue; + LogRef(d, runtimeBuf[fr.RtStart + j]); + } + } + + // One stack trace at the end of the stress-point block, with markers + // on the broken frames so a reader can correlate Frame #N above to + // the same #N here. + CDAC_LOG(" [STACK_TRACE] (cDAC=%d RT=%d frames=%d)\n", + cdacCount, runtimeCount, frameCount); + for (int i = 0; i < frameCount; i++) + { + char methodName[256]; + ResolveMethodName(frameResults[i].Source, frameResults[i].SourceType, + methodName, sizeof(methodName)); + + const char* marker = ""; + switch (frameResults[i].Outcome) + { + case FRAME_OUTCOME_MATCH: marker = ""; break; + case FRAME_OUTCOME_MISMATCH: marker = " <-- MISMATCH"; break; + case FRAME_OUTCOME_KNOWN_NIE: marker = " <-- KNOWN_NIE (PromoteCallerStack deferred)"; break; + } + CDAC_LOG(" #%d %s (cDAC=%d RT=%d)%s\n", + i, methodName, frameResults[i].CdacCount, frameResults[i].RtCount, marker); + } + + fflush(s_logFile); + } + cdacRefs.CloseRawBuffer(); + runtimeRefs.CloseRawBuffer(); +} + +//----------------------------------------------------------------------------- +// Trigger-point specializations: each MaybeVerify is invoked at the wired +// runtime site. They gate on IsEnabled, capture the caller's CONTEXT, and +// hand off to VerifyAtStressPoint for the shared work. +//----------------------------------------------------------------------------- + +void CdacStress::MaybeVerify() +{ + if (!IsEnabled()) + return; + + Thread* pThread = GetThreadNULLOk(); + if (pThread == nullptr || !pThread->PreemptiveGCDisabled()) + return; + + // The walk will start from inside MaybeVerify itself; the comparison + // treats this frame as just another frame (no need to skip it). + CONTEXT ctx; + RtlCaptureContext(&ctx); + + VerifyAtStressPoint(pThread, &ctx); +} + +//============================================================================= +// Rendering helpers +// +// All textual formatting / log emission for cdacstress lives here. Forward +// declarations near the top of the file allow the main logic to call into +// these helpers without inlining the formatting code into the algorithm. +// Adding new log shapes (e.g. new [DEBUG_*] blocks) belongs in this section. +//============================================================================= - // Step 4: Pass requires cDAC vs RT match. - // DAC mismatch is logged separately but doesn't affect pass/fail. - bool pass = rtMatch; +static const char* SideName(RefSide s) +{ + return s == SIDE_CDAC ? "cDAC" : "RT"; +} + +// Pretty-print a processor-encoding register number for the current target. +// Returns a short interned string. Unknown values render as "?". +// +// Register numbering matches the GcInfo encoding for each architecture +// (i.e. what gcdump's GetRegName / RegName produces). Negative values are +// rendered as "-" (meaning "ref is stack-resident, not register-resident"). +static const char* RegisterName(int reg) +{ +#if defined(TARGET_AMD64) + static const char* names[16] = { + "rax","rcx","rdx","rbx","rsp","rbp","rsi","rdi", + "r8","r9","r10","r11","r12","r13","r14","r15" + }; + if (reg >= 0 && reg < 16) return names[reg]; +#elif defined(TARGET_ARM64) + static const char* names[32] = { + "x0","x1","x2","x3","x4","x5","x6","x7", + "x8","x9","x10","x11","x12","x13","x14","x15", + "x16","x17","x18","x19","x20","x21","x22","x23", + "x24","x25","x26","x27","x28","fp","lr","sp" + }; + if (reg >= 0 && reg < 32) return names[reg]; +#elif defined(TARGET_X86) + static const char* names[8] = { + "eax","ecx","edx","ebx","esp","ebp","esi","edi" + }; + if (reg >= 0 && reg < 8) return names[reg]; +#elif defined(TARGET_ARM) + static const char* names[16] = { + "r0","r1","r2","r3","r4","r5","r6","r7", + "r8","r9","r10","r11","r12","sp","lr","pc" + }; + if (reg >= 0 && reg < 16) return names[reg]; +#elif defined(TARGET_LOONGARCH64) + static const char* names[33] = { + "r0","ra","tp","sp","a0","a1","a2","a3", + "a4","a5","a6","a7","t0","t1","t2","t3", + "t4","t5","t6","t7","t8","x0","fp","s0", + "s1","s2","s3","s4","s5","s6","s7","s8", + "pc" + }; + if (reg >= 0 && reg < 33) return names[reg]; +#elif defined(TARGET_RISCV64) + static const char* names[33] = { + "r0","ra","sp","gp","tp","t0","t1","t2", + "fp","s1","a0","a1","a2","a3","a4","a5", + "a6","a7","s2","s3","s4","s5","s6","s7", + "s8","s9","s10","s11","t3","t4","t5","t6", + "pc" + }; + if (reg >= 0 && reg < 33) return names[reg]; +#endif + if (reg < 0) return "-"; + return "?"; +} - if (pass) - InterlockedIncrement(&s_verifyPass); +// Format ref Flags as a bit-name list (e.g. "Interior|Pinned" or "-"). +// Writes into caller-supplied buffer to avoid TLS / allocation. +static const char* FormatRefFlags(unsigned int flags, char* buf, size_t bufLen) +{ + if (flags == 0) { strncpy_s(buf, bufLen, "-", _TRUNCATE); return buf; } + buf[0] = '\0'; + bool first = true; + auto append = [&](const char* s) { + if (!first) strncat_s(buf, bufLen, "|", _TRUNCATE); + strncat_s(buf, bufLen, s, _TRUNCATE); + first = false; + }; + if (flags & SOSRefInterior) append("Interior"); + if (flags & SOSRefPinned) append("Pinned"); + unsigned int known = SOSRefInterior | SOSRefPinned; + if (flags & ~known) + { + char other[24]; + sprintf_s(other, ARRAY_SIZE(other), "0x%x", flags & ~known); + append(other); + } + return buf; +} + +static const char* DispositionName(RefDisposition d) +{ + switch (d) + { + case REF_MATCHED: return "MATCHED"; + case REF_ONLY: return "ONLY"; + case REF_NIE: return "NIE"; + default: return "?"; + } +} + +// Concise per-ref line. Side label is derived from ref.Side; disposition is +// supplied by the comparison layer. No-op if s_logFile is nullptr. +static void LogRefConcise(RefDisposition disp, const StackRef& r) +{ + CDAC_LOG(" [%s(%s)] Addr=0x%llx Obj=0x%llx Flags=0x%x Reg=%d Off=%d\n", + DispositionName(disp), SideName(r.Side), + (unsigned long long)r.Address, (unsigned long long)r.Object, r.Flags, + r.Register, r.Offset); +} + +// Verbose per-ref line — emitted when CDACSTRESS_VERBOSE is on. No-op if +// s_logFile is nullptr. +static void LogRefVerbose(RefDisposition disp, const StackRef& r) +{ + char flagBuf[64]; + FormatRefFlags(r.Flags, flagBuf, ARRAY_SIZE(flagBuf)); + + const char* regName = RegisterName(r.Register); + bool hasReg = r.Register >= 0; + + CDAC_LOG( + " [%s(%s)] Addr=0x%llx Obj=0x%llx Flags=%s HasReg=%s Reg=%s(%d) Off=%d SP=0x%llx\n", + DispositionName(disp), SideName(r.Side), + (unsigned long long)r.Address, + (unsigned long long)r.Object, + flagBuf, + hasReg ? "Y" : "N", + regName, r.Register, + r.Offset, + (unsigned long long)r.StackPointer); +} + +// Dispatch to verbose or concise based on the global flag. +static void LogRef(RefDisposition disp, const StackRef& r) +{ + if (IsCdacStressVerboseEnabled()) + LogRefVerbose(disp, r); else - InterlockedIncrement(&s_verifyFail); + LogRefConcise(disp, r); +} - // Step 5: Log results. - if (s_logFile != nullptr) +// Resolve a managed IP (or Frame*) to a printable name for log output. +// Falls back to "" or "" if resolution fails. +// Uses the cDAC's ISOSDacInterface by default; we're running in-process +// against live pointers, so dereferencing Frame* directly is safe (no DAC +// marshaling needed). +static void ResolveMethodName(CLRDATA_ADDRESS source, int sourceType, char* buf, int bufLen) +{ + if (bufLen <= 0) + return; + + if (sourceType != 0) // SOS_StackSourceFrame { - const char* label = pass ? "PASS" : "FAIL"; - if (pass && !dacMatch) - label = "DAC_MISMATCH"; - fprintf(s_logFile, "[%s] Thread=0x%x IP=0x%p cDAC=%d DAC=%d RT=%d\n", - label, osThreadId, (void*)GetIP(regs), - rawCdacCount, rawDacCount, runtimeCount); - - if (!pass || !dacMatch) + Frame* pFrame = reinterpret_cast(source); + LPCSTR typeName = Frame::GetFrameTypeName(pFrame->GetFrameIdentifier()); + if (typeName != nullptr) + snprintf(buf, bufLen, "", typeName, (unsigned long long)source); + else + snprintf(buf, bufLen, "", (unsigned long long)source); + return; + } + + ISOSDacInterface* pSos = s_cdacSosDac; + + if (pSos != nullptr) + { + CLRDATA_ADDRESS mdAddr = 0; + if (SUCCEEDED(pSos->GetMethodDescPtrFromIP(source, &mdAddr)) && mdAddr != 0) { - for (int i = 0; i < rawCdacCount; i++) - fprintf(s_logFile, " cDAC [%d]: Address=0x%llx Object=0x%llx Flags=0x%x Source=0x%llx SourceType=%d SP=0x%llx\n", - i, (unsigned long long)cdacRefs[i].Address, (unsigned long long)cdacRefs[i].Object, - cdacRefs[i].Flags, (unsigned long long)cdacRefs[i].Source, cdacRefs[i].SourceType, - (unsigned long long)cdacRefs[i].StackPointer); - if (haveDac) + WCHAR wname[256] = {}; + unsigned int nameLen = 0; + if (SUCCEEDED(pSos->GetMethodDescName(mdAddr, ARRAY_SIZE(wname), wname, &nameLen)) && nameLen > 0) { - for (int i = 0; i < rawDacCount; i++) - fprintf(s_logFile, " DAC [%d]: Address=0x%llx Object=0x%llx Flags=0x%x Source=0x%llx\n", - i, (unsigned long long)dacRefs[i].Address, (unsigned long long)dacRefs[i].Object, - dacRefs[i].Flags, (unsigned long long)dacRefs[i].Source); + WideCharToMultiByte(CP_UTF8, 0, wname, -1, buf, bufLen, NULL, NULL); + return; } - for (int i = 0; i < runtimeCount; i++) - fprintf(s_logFile, " RT [%d]: Address=0x%llx Object=0x%llx Flags=0x%x\n", - i, (unsigned long long)runtimeRefsBuf[i].Address, (unsigned long long)runtimeRefsBuf[i].Object, - runtimeRefsBuf[i].Flags); - - fflush(s_logFile); } } + + snprintf(buf, bufLen, "", (unsigned long long)source); } -#endif // HAVE_GCCOVER +#endif // CDAC_STRESS diff --git a/src/coreclr/vm/cdacstress.h b/src/coreclr/vm/cdacstress.h index b151155559e9c5..799a256f5a5caa 100644 --- a/src/coreclr/vm/cdacstress.h +++ b/src/coreclr/vm/cdacstress.h @@ -7,119 +7,60 @@ // Infrastructure for verifying cDAC stack reference reporting against the // runtime's own GC root enumeration at stress trigger points. // -// Enabled via DOTNET_CdacStress (bit flags) or legacy DOTNET_GCStress=0x20. +// Enabled via DOTNET_CdacStress // #ifndef _CDAC_STRESS_H_ #define _CDAC_STRESS_H_ +// Forward declarations +class Thread; + // Trigger points for cDAC stress verification. enum cdac_trigger_points { - cdac_on_alloc, // Verify at allocation points - cdac_on_gc, // Verify at GC trigger points - cdac_on_instr, // Verify at instruction-level stress points (needs GCStress=0x4) + cdac_on_alloc, // Verify at allocation points (gchelpers.cpp) }; -#ifdef HAVE_GCCOVER - -// Bit flags for DOTNET_CdacStress configuration. -// -// Low nibble: WHERE to trigger verification -// High nibble: WHAT to validate -// Modifier: HOW to filter -enum CdacStressFlags : DWORD +namespace CdacStressPolicy { - // Trigger points (low nibble — where stress fires) - CDACSTRESS_ALLOC = 0x1, // Verify at allocation points - CDACSTRESS_GC = 0x2, // Verify at GC trigger points (future) - CDACSTRESS_INSTR = 0x4, // Verify at instruction stress points (needs GCStress=0x4) - - // Validation types (high nibble — what to check) - CDACSTRESS_REFS = 0x10, // Compare GC stack references - CDACSTRESS_WALK = 0x20, // Compare IXCLRDataStackWalk frame-by-frame - CDACSTRESS_USE_DAC = 0x40, // Also load legacy DAC and compare cDAC against it + // Initialize the cDAC stress framework. No-op if DOTNET_CdacStress is unset. + // Idempotent. Called once early in EE startup. + void Initialize(); - // Modifiers - CDACSTRESS_UNIQUE = 0x100, // Only verify on unique (IP, SP) pairs -}; + // Tear down the framework, release the cDAC reader, flush logs, print summary. + void Shutdown(); +} -// Forward declarations -class Thread; +#if defined(CDAC_STRESS) && !defined(DACCESS_COMPILE) -// Accessor for the resolved stress level — called by template specializations. -DWORD GetCdacStressLevel(); +// Per-trigger class template. The primary template is intentionally empty -- +// only the explicit specializations below are usable. +template +class CdacStress; -class CdacStress +template<> +class CdacStress { public: - static bool Initialize(); - static void Shutdown(); - static bool IsInitialized(); - - // Returns true if cDAC stress is enabled via DOTNET_CdacStress or legacy GCSTRESS_CDAC. - static bool IsEnabled(); - - // Template-based trigger point check, following the GCStress pattern. - template + // Returns true if alloc-point cDAC stress is enabled (DOTNET_CdacStress has CDACSTRESS_ALLOC). static bool IsEnabled(); - // Returns true if unique-stack filtering is active. - static bool IsUniqueEnabled(); - - // Verify at a stress point if the given trigger is enabled and not skipped. - // Follows the GCStress::MaybeTrigger pattern — call sites are one-liners. - template - FORCEINLINE static void MaybeVerify(Thread* pThread, PCONTEXT regs) - { - if (IsEnabled() && !ShouldSkipStressPoint()) - VerifyAtStressPoint(pThread, regs); - } - - // Allocation-point variant: captures thread context automatically. - template - FORCEINLINE static void MaybeVerify() - { - if (IsEnabled() && !ShouldSkipStressPoint()) - VerifyAtAllocPoint(); - } - - // Main entry point: verify cDAC stack refs match runtime stack refs. - static void VerifyAtStressPoint(Thread* pThread, PCONTEXT regs); - - // Verify at an allocation point. Captures current thread context. - static void VerifyAtAllocPoint(); - - // Returns true if this stress point should be skipped (step throttling). - static bool ShouldSkipStressPoint(); + // Verify cDAC stack refs at an allocation point. Captures the current thread + // context internally and walks past the caller's frame. + static void MaybeVerify(); }; -template<> FORCEINLINE bool CdacStress::IsEnabled() -{ - return IsInitialized() && (GetCdacStressLevel() & CDACSTRESS_ALLOC) != 0; -} - -template<> FORCEINLINE bool CdacStress::IsEnabled() -{ - return IsInitialized() && (GetCdacStressLevel() & CDACSTRESS_GC) != 0; -} - -template<> FORCEINLINE bool CdacStress::IsEnabled() -{ - return IsInitialized() && (GetCdacStressLevel() & CDACSTRESS_INSTR) != 0; -} - -#else // !HAVE_GCCOVER +#else // active in runtime only -// Stub when HAVE_GCCOVER is not defined — all calls compile to nothing. +// Stubs for DAC builds and !CDAC_STRESS -- all calls compile to nothing. +template class CdacStress { public: - template - FORCEINLINE static void MaybeVerify(Thread* pThread, PCONTEXT regs) { } - template + FORCEINLINE static bool IsEnabled() { return false; } FORCEINLINE static void MaybeVerify() { } }; -#endif // HAVE_GCCOVER +#endif // CDAC_STRESS && !DACCESS_COMPILE #endif // _CDAC_STRESS_H_ diff --git a/src/coreclr/vm/ceemain.cpp b/src/coreclr/vm/ceemain.cpp index 70867dd457e08c..d9ab0c0c0d5326 100644 --- a/src/coreclr/vm/ceemain.cpp +++ b/src/coreclr/vm/ceemain.cpp @@ -988,10 +988,9 @@ void EEStartupHelper() #ifdef HAVE_GCCOVER MethodDesc::Init(); - if (CdacStress::IsEnabled()) - { - CdacStress::Initialize(); - } +#endif +#ifdef CDAC_STRESS + CdacStressPolicy::Initialize(); #endif Assembly::Initialize(); @@ -1273,8 +1272,8 @@ void STDMETHODCALLTYPE EEShutDownHelper(BOOL fIsDllUnloading) // Indicate the EE is the shut down phase. InterlockedOr((LONG*)&g_fEEShutDown, ShutDown_Start); -#ifdef HAVE_GCCOVER - CdacStress::Shutdown(); +#ifdef CDAC_STRESS + CdacStressPolicy::Shutdown(); #endif if (!IsAtProcessExit() && !g_fFastExitProcess) diff --git a/src/coreclr/vm/common.h b/src/coreclr/vm/common.h index d71ab91caa5838..61a7a486717027 100644 --- a/src/coreclr/vm/common.h +++ b/src/coreclr/vm/common.h @@ -315,6 +315,7 @@ namespace Loader #include "dynamicmethod.h" #include "gcstress.h" +#include "cdacstress.h" HRESULT EnsureRtlFunctions(); diff --git a/src/coreclr/vm/eeconfig.h b/src/coreclr/vm/eeconfig.h index 74683d485f1d1a..b9eddae5e9efbb 100644 --- a/src/coreclr/vm/eeconfig.h +++ b/src/coreclr/vm/eeconfig.h @@ -366,9 +366,7 @@ class EEConfig GCSTRESS_INSTR_JIT = 4, // GC on every allowable JITed instr GCSTRESS_INSTR_NGEN = 8, // GC on every allowable NGEN instr GCSTRESS_UNIQUE = 16, // GC only on a unique stack trace - GCSTRESS_CDAC = 32, // Verify cDAC GC references at stress points - // Excludes cDAC stress as it is fundamentally different from the other stress modes GCSTRESS_ALLSTRESS = GCSTRESS_ALLOC | GCSTRESS_TRANSITION | GCSTRESS_INSTR_JIT | GCSTRESS_INSTR_NGEN, }; diff --git a/src/coreclr/vm/gccover.cpp b/src/coreclr/vm/gccover.cpp index 64f22359891a57..007374953c6a6b 100644 --- a/src/coreclr/vm/gccover.cpp +++ b/src/coreclr/vm/gccover.cpp @@ -24,7 +24,6 @@ #include "gccover.h" #include "virtualcallstub.h" #include "threadsuspend.h" -#include "cdacstress.h" #if defined(TARGET_AMD64) || defined(TARGET_ARM) #include "gcinfodecoder.h" @@ -888,8 +887,6 @@ void DoGcStress (PCONTEXT regs, NativeCodeVersion nativeCodeVersion) // Do the actual stress work // - CdacStress::MaybeVerify(pThread, regs); - // BUG(github #10318) - when not using allocation contexts, the alloc lock // must be acquired here. Until fixed, this assert prevents random heap corruption. assert(GCHeapUtilities::UseThreadAllocationContexts()); @@ -1198,8 +1195,6 @@ void DoGcStress (PCONTEXT regs, NativeCodeVersion nativeCodeVersion) // Do the actual stress work // - CdacStress::MaybeVerify(pThread, regs); - // BUG(github #10318)- when not using allocation contexts, the alloc lock // must be acquired here. Until fixed, this assert prevents random heap corruption. assert(GCHeapUtilities::UseThreadAllocationContexts()); diff --git a/src/coreclr/vm/gchelpers.cpp b/src/coreclr/vm/gchelpers.cpp index c9e3020948fe3e..ce88ac5f44c2b5 100644 --- a/src/coreclr/vm/gchelpers.cpp +++ b/src/coreclr/vm/gchelpers.cpp @@ -412,8 +412,7 @@ inline Object* Alloc(ee_alloc_context* pEEAllocContext, size_t size, GC_ALLOC_FL } } - // Verify cDAC stack references before the allocation-triggered GC (while refs haven't moved). - CdacStress::MaybeVerify(); + CdacStress::MaybeVerify(); GCStress::MaybeTrigger(pAllocContext); @@ -481,7 +480,7 @@ inline Object* Alloc(size_t size, GC_ALLOC_FLAGS flags) if (GCHeapUtilities::UseThreadAllocationContexts()) { ee_alloc_context *threadContext = GetThreadEEAllocContext(); - CdacStress::MaybeVerify(); + CdacStress::MaybeVerify(); GCStress::MaybeTrigger(&threadContext->m_GCAllocContext); retVal = Alloc(threadContext, size, flags); } @@ -489,7 +488,7 @@ inline Object* Alloc(size_t size, GC_ALLOC_FLAGS flags) { GlobalAllocLockHolder holder(&g_global_alloc_lock); ee_alloc_context *globalContext = &g_global_alloc_context; - CdacStress::MaybeVerify(); + CdacStress::MaybeVerify(); GCStress::MaybeTrigger(&globalContext->m_GCAllocContext); retVal = Alloc(globalContext, size, flags); } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs index 8a70ba34aae189..990af49d265f64 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs @@ -51,6 +51,7 @@ public interface IGCInfo : IContract uint GetCodeLength(IGCInfoHandle handle) => throw new NotImplementedException(); uint GetStackBaseRegister(IGCInfoHandle handle) => throw new NotImplementedException(); + uint GetSizeOfStackParameterArea(IGCInfoHandle handle) => throw new NotImplementedException(); IReadOnlyList GetInterruptibleRanges(IGCInfoHandle handle) => throw new NotImplementedException(); IReadOnlyList EnumerateLiveSlots(IGCInfoHandle handle, uint instructionOffset, GcSlotEnumerationOptions options) => throw new NotImplementedException(); } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs index aa7d919b8aa8d6..8dcded7b994d17 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs @@ -518,6 +518,12 @@ public uint GetStackBaseRegister() return _stackBaseRegister; } + public uint GetSizeOfStackParameterArea() + { + EnsureDecodedTo(DecodePoints.ReversePInvoke); + return _fixedStackParameterScratchArea; + } + public IReadOnlyList GetInterruptibleRanges() { EnsureDecodedTo(DecodePoints.InterruptibleRanges); @@ -841,9 +847,6 @@ private void ReportSlot(uint slotIndex, bool reportScratchSlots, bool reportFpBa } else { - // Skip scratch stack slots for non-leaf frames (slots in the outgoing/scratch area) - if (!reportScratchSlots && TTraits.IsScratchStackSlot(slot.SpOffset, (uint)slot.Base, _fixedStackParameterScratchArea)) - return; // FP-based-only mode: only report GC_FRAMEREG_REL slots if (reportFpBasedSlotsOnly && slot.Base != GcStackSlotBase.GC_FRAMEREG_REL) return; diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs index db1dc4dd79d519..ad261440db826b 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs @@ -34,6 +34,12 @@ uint IGCInfo.GetStackBaseRegister(IGCInfoHandle gcInfoHandle) return handle.GetStackBaseRegister(); } + uint IGCInfo.GetSizeOfStackParameterArea(IGCInfoHandle gcInfoHandle) + { + IGCInfoDecoder handle = AssertCorrectHandle(gcInfoHandle); + return handle.GetSizeOfStackParameterArea(); + } + IReadOnlyList IGCInfo.GetInterruptibleRanges(IGCInfoHandle gcInfoHandle) { IGCInfoDecoder handle = AssertCorrectHandle(gcInfoHandle); diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs index fcf7aa46c691bb..e6b146e4396284 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs @@ -11,6 +11,7 @@ internal interface IGCInfoDecoder : IGCInfoHandle { uint GetCodeLength(); uint GetStackBaseRegister(); + uint GetSizeOfStackParameterArea(); IReadOnlyList GetInterruptibleRanges(); IReadOnlyList EnumerateLiveSlots(uint instructionOffset, GcSlotEnumerationOptions options); } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/AMD64GCInfoTraits.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/AMD64GCInfoTraits.cs index 023a5d4b191bdc..6d6ae2b57c7385 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/AMD64GCInfoTraits.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/AMD64GCInfoTraits.cs @@ -58,15 +58,4 @@ public static bool IsScratchRegister(uint regNum) return (preservedMask & (1u << (int)regNum)) == 0; } - // AMD64 has a fixed stack parameter scratch area (shadow space + outgoing args). - // Stack slots with GC_SP_REL base and offset in [0, scratchAreaSize) are scratch slots. - // This matches the native IsScratchStackSlot which computes GetStackSlot and checks - // pSlot < pRD->SP + m_SizeOfStackOutgoingAndScratchArea. - public static bool IsScratchStackSlot(int spOffset, uint spBase, uint fixedStackParameterScratchArea) - { - // GC_SP_REL = 1 - return spBase == 1 - && spOffset >= 0 - && (uint)spOffset < fixedStackParameterScratchArea; - } } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARM64GCInfoTraits.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARM64GCInfoTraits.cs index 6381e22861124b..730447950b2fb7 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARM64GCInfoTraits.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARM64GCInfoTraits.cs @@ -44,14 +44,4 @@ internal class ARM64GCInfoTraits : IGCInfoTraits // Preserved (non-scratch): x19-x28 // Scratch: x0-x17, x29(FP), x30(LR) public static bool IsScratchRegister(uint regNum) => regNum <= 17 || regNum >= 29; - - // ARM64 has a fixed stack parameter scratch area. - // Stack slots with GC_SP_REL base in [0, scratchAreaSize) are scratch slots. - public static bool IsScratchStackSlot(int spOffset, uint spBase, uint fixedStackParameterScratchArea) - { - // GC_SP_REL = 1 - return spBase == 1 - && spOffset >= 0 - && (uint)spOffset < fixedStackParameterScratchArea; - } } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARMGCInfoTraits.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARMGCInfoTraits.cs index ebbb953d0c2d7d..c26261f7e4fb6e 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARMGCInfoTraits.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/ARMGCInfoTraits.cs @@ -44,14 +44,4 @@ internal class ARMGCInfoTraits : IGCInfoTraits // Preserved (non-scratch): r4-r11 (and r14/LR is special) // Scratch: r0-r3, r12, r14 public static bool IsScratchRegister(uint regNum) => regNum <= 3 || regNum == 12 || regNum == 14; - - // ARM has a fixed stack parameter scratch area. - // Stack slots with GC_SP_REL base in [0, scratchAreaSize) are scratch slots. - public static bool IsScratchStackSlot(int spOffset, uint spBase, uint fixedStackParameterScratchArea) - { - // GC_SP_REL = 1 - return spBase == 1 - && spOffset >= 0 - && (uint)spOffset < fixedStackParameterScratchArea; - } } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/IGCInfoTraits.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/IGCInfoTraits.cs index 2d3c5faf59579a..24c8f80f4ba703 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/IGCInfoTraits.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/PlatformTraits/IGCInfoTraits.cs @@ -54,14 +54,6 @@ internal interface IGCInfoTraits /// static abstract bool IsScratchRegister(uint regNum); - /// - /// Returns true if a stack slot at the given offset and base is in the scratch/outgoing area. - /// Scratch stack slots should only be reported for the active (leaf) stack frame. - /// spBase uses the GcStackSlotBase encoding: 0=CALLER_SP_REL, 1=SP_REL, 2=FRAMEREG_REL. - /// - static virtual bool IsScratchStackSlot(int spOffset, uint spBase, uint fixedStackParameterScratchArea) - => false; - // These are the same across all platforms static virtual int POINTER_SIZE_ENCBASE { get; } = 3; static virtual int LIVESTATE_RLE_RUN_ENCBASE { get; } = 2; diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanContext.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanContext.cs index 184a875c908980..1a2b8b5b2c226e 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanContext.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanContext.cs @@ -29,6 +29,22 @@ public void UpdateScanContext(TargetPointer sp, TargetPointer ip, TargetPointer Frame = frame; } + public void RecordDeferredFrame(TargetPointer frameAddress) + { + StackRefs.Add(new StackRefData + { + HasRegisterInformation = false, + Register = 0, + Offset = 0, + Address = 0, + Object = 0, + Flags = GcScanFlags.CDAC_DEFERRED_FRAME, + SourceType = StackRefData.SourceTypes.StackSourceFrame, + Source = frameAddress, + StackPointer = StackPointer, + }); + } + public void GCEnumCallback(TargetPointer pObject, GcScanFlags flags, GcScanSlotLocation loc) { TargetPointer addr; diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanFlags.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanFlags.cs index 0575b625d5b9d4..ced0a9f76c79ea 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanFlags.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanFlags.cs @@ -11,4 +11,9 @@ internal enum GcScanFlags None = 0x0, GC_CALL_INTERIOR = 0x1, GC_CALL_PINNED = 0x2, + + // cDAC-private sentinel: this StackRefData is not a real GC reference but + // a marker that an explicit Frame at `Source` was deliberately skipped by + // the cDAC because the code path required is not implemented yet. + CDAC_DEFERRED_FRAME = 0x40000000, } diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs index 91a87b66bd6d82..c94e7b66757a9d 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs @@ -46,6 +46,8 @@ public void EnumGcRefsForManagedFrame( IGCInfoHandle handle = _gcInfo.DecodePlatformSpecificGCInfo(gcInfoAddr, gcVersion); uint stackBaseRegister = _gcInfo.GetStackBaseRegister(handle); + uint scratchAreaSize = _gcInfo.GetSizeOfStackParameterArea(handle); + bool filterScratchStackSlots = !options.IsActiveFrame; TargetPointer? callerSP = null; uint offsetToUse = relOffsetOverride ?? (uint)relativeOffset.Value; @@ -86,6 +88,19 @@ public void EnumGcRefsForManagedFrame( }; TargetPointer addr = new(baseAddr.Value + (ulong)(long)slot.SpOffset); + + // Mirror native IsScratchStackSlot (gcinfodecoder.cpp, post-PR #119446 unified form): + // for non-leaf frames, drop any stack slot whose resolved address lies in the + // outgoing/scratch area [SP, SP + SizeOfStackOutgoingAndScratchArea). This applies + // to all stack base kinds (GC_SP_REL, GC_FRAMEREG_REL, GC_CALLER_SP_REL) because + // the filter is address-based, not offset-based. + if (filterScratchStackSlots && scratchAreaSize > 0) + { + ulong sp = context.StackPointer.Value; + if (addr.Value >= sp && addr.Value < sp + scratchAreaSize) + continue; + } + GcScanSlotLocation loc = new(reg, slot.SpOffset, true); scanContext.GCEnumCallback(addr, scanFlags, loc); } @@ -118,7 +133,7 @@ public void GcScanRoots(TargetPointer frameAddress, GcScanContext scanContext) if (gcRefMap != TargetPointer.Null) PromoteCallerStackUsingGCRefMap(fmf.TransitionBlockPtr, gcRefMap, scanContext); else - PromoteCallerStack(frameAddress, fmf.TransitionBlockPtr, scanContext); + PromoteCallerStack(frameAddress, scanContext); break; } @@ -134,7 +149,7 @@ public void GcScanRoots(TargetPointer frameAddress, GcScanContext scanContext) if (gcRefMap != TargetPointer.Null) PromoteCallerStackUsingGCRefMap(fmf.TransitionBlockPtr, gcRefMap, scanContext); else - PromoteCallerStack(frameAddress, fmf.TransitionBlockPtr, scanContext); + PromoteCallerStack(frameAddress, scanContext); break; } @@ -149,8 +164,7 @@ public void GcScanRoots(TargetPointer frameAddress, GcScanContext scanContext) case FrameType.CallCountingHelperFrame: case FrameType.PrestubMethodFrame: { - Data.FramedMethodFrame fmf = _target.ProcessedData.GetOrAdd(frameAddress); - PromoteCallerStack(frameAddress, fmf.TransitionBlockPtr, scanContext); + PromoteCallerStack(frameAddress, scanContext); break; } @@ -316,147 +330,15 @@ private TargetPointer FindGCRefMap(TargetPointer indirection) /// Entry point for promoting caller stack GC references via method signature. /// Matches native TransitionFrame::PromoteCallerStack (frames.cpp:1494). /// - private void PromoteCallerStack( - TargetPointer frameAddress, - TargetPointer transitionBlock, - GcScanContext scanContext) - { - Data.FramedMethodFrame fmf = _target.ProcessedData.GetOrAdd(frameAddress); - TargetPointer methodDescPtr = fmf.MethodDescPtr; - if (methodDescPtr == TargetPointer.Null) - return; - - IRuntimeTypeSystem rts = _target.Contracts.RuntimeTypeSystem; - MethodDescHandle mdh = rts.GetMethodDescHandle(methodDescPtr); - - MethodSignature methodSig; - try - { - TargetPointer methodTablePtr = rts.GetMethodTable(mdh); - TypeHandle typeHandle = rts.GetTypeHandle(methodTablePtr); - TargetPointer modulePtr = rts.GetModule(typeHandle); - - ModuleHandle moduleHandle = _target.Contracts.Loader.GetModuleHandleFromModulePtr(modulePtr); - MetadataReader? mdReader = _target.Contracts.EcmaMetadata.GetMetadata(moduleHandle); - if (mdReader is null) - return; - - GcSignatureTypeProvider provider = new(_target, moduleHandle); - GcSignatureContext genericContext = new(typeHandle, mdh); - RuntimeSignatureDecoder decoder = new( - provider, _target, mdReader, genericContext); - - // Match native MethodDesc::GetSig: prefer stored signature (dynamic, EEImpl, - // and array method descs) before falling back to a metadata token lookup. - if (rts.IsStoredSigMethodDesc(mdh, out ReadOnlySpan storedSig)) - { - unsafe - { - fixed (byte* pStoredSig = storedSig) - { - BlobReader blobReader = new BlobReader(pStoredSig, storedSig.Length); - methodSig = decoder.DecodeMethodSignature(ref blobReader); - } - } - } - else - { - uint methodToken = rts.GetMethodToken(mdh); - if (methodToken == (uint)EcmaMetadataUtils.TokenType.mdtMethodDef) - return; - - MethodDefinitionHandle methodDefHandle = MetadataTokens.MethodDefinitionHandle((int)EcmaMetadataUtils.GetRowId(methodToken)); - MethodDefinition methodDef = mdReader.GetMethodDefinition(methodDefHandle); - - BlobReader blobReader = mdReader.GetBlobReader(methodDef.Signature); - methodSig = decoder.DecodeMethodSignature(ref blobReader); - } - } - catch (System.Exception) - { - return; - } - - if (methodSig.Header.CallingConvention is SignatureCallingConvention.VarArgs) - return; - - bool hasThis = methodSig.Header.IsInstance; - bool hasRetBuf = methodSig.ReturnType is GcTypeKind.Other; - bool requiresInstArg = false; - bool isAsync = false; - bool isValueTypeThis = false; - - try - { - requiresInstArg = rts.GetGenericContextLoc(mdh) is GenericContextLoc.InstArgMethodDesc or GenericContextLoc.InstArgMethodTable; - isAsync = rts.IsAsyncMethod(mdh); - } - catch - { - } - - PromoteCallerStackHelper(transitionBlock, methodSig, hasThis, hasRetBuf, - requiresInstArg, isAsync, isValueTypeThis, scanContext); - } - - /// - /// Core logic for promoting caller stack GC references. - /// Matches native TransitionFrame::PromoteCallerStackHelper (frames.cpp:1560). - /// - private void PromoteCallerStackHelper( - TargetPointer transitionBlock, - MethodSignature methodSig, - bool hasThis, - bool hasRetBuf, - bool requiresInstArg, - bool isAsync, - bool isValueTypeThis, - GcScanContext scanContext) + /// + /// Not yet ported. Every call records a deferred frame so the stress harness + /// buckets the resulting cDAC-vs-runtime diff at this frame as a known issue + /// rather than a real cDAC bug. Will be replaced with a real port once the + /// signature- and ArgIterator-based ref enumeration lands. + /// + private static void PromoteCallerStack(TargetPointer frameAddress, GcScanContext scanContext) { - Data.TransitionBlock tb = _target.ProcessedData.GetOrAdd(transitionBlock); - - int numRegistersUsed = 0; - if (hasThis) - numRegistersUsed++; - if (hasRetBuf) - numRegistersUsed++; - if (requiresInstArg) - numRegistersUsed++; - if (isAsync) - numRegistersUsed++; - - bool isArm64 = IsTargetArm64(); - if (isArm64) - numRegistersUsed++; - - if (hasThis) - { - int thisPos = isArm64 ? 1 : 0; - TargetPointer thisAddr = AddressFromGCRefMapPos(tb, thisPos); - GcScanFlags thisFlags = isValueTypeThis ? GcScanFlags.GC_CALL_INTERIOR : GcScanFlags.None; - scanContext.GCReportCallback(thisAddr, thisFlags); - } - - int pos = numRegistersUsed; - foreach (GcTypeKind kind in methodSig.ParameterTypes) - { - TargetPointer slotAddress = AddressFromGCRefMapPos(tb, pos); - - switch (kind) - { - case GcTypeKind.Ref: - scanContext.GCReportCallback(slotAddress, GcScanFlags.None); - break; - case GcTypeKind.Interior: - scanContext.GCReportCallback(slotAddress, GcScanFlags.GC_CALL_INTERIOR); - break; - case GcTypeKind.Other: - break; - case GcTypeKind.None: - break; - } - pos++; - } + scanContext.RecordDeferredFrame(frameAddress); } private TargetPointer AddressFromGCRefMapPos(Data.TransitionBlock tb, int pos) @@ -464,11 +346,6 @@ private TargetPointer AddressFromGCRefMapPos(Data.TransitionBlock tb, int pos) return new TargetPointer(tb.FirstGCRefMapSlot.Value + (ulong)(pos * _target.PointerSize)); } - private bool IsTargetArm64() - { - return _target.Contracts.RuntimeInfo.GetTargetArchitecture() is RuntimeInfoArchitecture.Arm64; - } - private TargetPointer GetCallerSP(IPlatformAgnosticContext context, ref TargetPointer? cached) { if (cached is null) diff --git a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs index a0840f0362f2ff..cc400480fbd6b2 100644 --- a/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs +++ b/src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs @@ -19,6 +19,8 @@ namespace Microsoft.Diagnostics.DataContractReader.Legacy; /// public sealed unsafe partial class SOSDacImpl : IXCLRDataProcess, IXCLRDataProcess2 { + private const uint DacStressPrivRequestFlushTargetState = 0xf2000000; + int IXCLRDataProcess.Flush() { _target.Flush(FlushScope.All); @@ -755,12 +757,22 @@ int IXCLRDataProcess.Request(uint reqCode, uint inBufferSize, byte* inBuffer, ui hr = HResults.S_OK; } } + else if (reqCode == DacStressPrivRequestFlushTargetState) + { + if (inBufferSize == 0 && inBuffer is null && outBufferSize == 0 && outBuffer is null) + { + _target.Flush(FlushScope.ForwardExecution); + hr = HResults.S_OK; + } + } else { return LegacyFallbackHelper.CanFallback() && _legacyProcess is not null ? _legacyProcess.Request(reqCode, inBufferSize, inBuffer, outBufferSize, outBuffer) : HResults.E_NOTIMPL; } #if DEBUG - if (_legacyProcess is not null) + // The private DACSTRESSPRIV_REQUEST_FLUSH_TARGET_STATE opcode is cDAC-only + // and must NOT be forwarded to the legacy DAC. + if (_legacyProcess is not null && reqCode != DacStressPrivRequestFlushTargetState) { byte[] localBuffer = new byte[(int)outBufferSize]; fixed (byte* localOutBuffer = localBuffer) diff --git a/src/native/managed/cdac/README.md b/src/native/managed/cdac/README.md index 94f3e4aa861fd1..6a0648bd55feae 100644 --- a/src/native/managed/cdac/README.md +++ b/src/native/managed/cdac/README.md @@ -51,44 +51,6 @@ ISOSDacInterface* / IXCLRDataProcess (COM-style API surface) | `mscordaccore_universal` | Entry point that wires everything together | | `tests` | Unit tests with mock memory infrastructure | -## GC Stress Verification (GCSTRESS_CDAC) - -The cDAC includes a GC stress verification mode that compares the cDAC's stack reference -enumeration against the runtime's own GC root scanning at every GC stress instruction-level -trigger point. - -### How it works - -When `DOTNET_CdacStress` is set, at each configured stress point: -1. The cDAC is loaded in-process and enumerates stack GC references via `GetStackReferences` -2. The runtime enumerates the same references via `StackWalkFrames` + `GcStackCrawlCallBack` -3. The tool compares the two sets and reports mismatches - -### Usage - -```bash -DOTNET_CdacStress=0x11 DOTNET_CdacStressLogFile=results.txt corerun test.dll -``` - -Configuration variables: -- `DOTNET_CdacStress=0x11` — Enable alloc-point cDAC verification (see [StressTests README](tests/StressTests/README.md) for all flags) -- `DOTNET_CdacStressFailFast=1` — Assert on mismatch (default: log and continue) -- `DOTNET_CdacStressLogFile=` — Write detailed results to a log file - -### Files - -| File | Location | Purpose | -|------|----------|---------| -| `cdacstress.h/cpp` | `src/coreclr/vm/` | In-process cDAC loading and comparison | -| `RunStressTests.ps1` | `src/native/managed/cdac/tests/StressTests/` | Build and test script | -| `known-issues.md` | `src/native/managed/cdac/tests/StressTests/` | Documented gaps | - -### Known limitations - -See [tests/StressTests/known-issues.md](tests/StressTests/known-issues.md) for the full list. -Key gaps include explicit frame GC root scanning (ScanFrameRoots) for stub frames. -Current pass rate: ~99.5%. - ## Contract specifications Each contract has a specification document in diff --git a/src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs b/src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs new file mode 100644 index 00000000000000..a5270ed023101b --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs @@ -0,0 +1,66 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System.Collections.Generic; +using System.Runtime.InteropServices; +using System.Threading.Tasks; +using Microsoft.DotNet.XUnitExtensions; +using Xunit; +using Xunit.Abstractions; + +namespace Microsoft.Diagnostics.DataContractReader.Tests.GCStress; + +/// +/// Runs each debuggee app under corerun with DOTNET_CdacStress=0x001 (ALLOC) +/// and asserts that the cDAC stack reference verification produces no +/// `[FAIL]` results. `[KNOWN_ISSUE]` verifications (where the cDAC explicitly +/// marks a frame as deferred via `RecordDeferredFrame`) are tolerated. +/// +/// +/// Prerequisites: +/// - Build CoreCLR + cDAC (Checked): build.cmd -subset clr.runtime+tools.cdac -c Checked +/// - Generate core_root: src\tests\build.cmd Checked generatelayoutonly /p:LibrariesConfiguration=Release +/// - Build debuggees: dotnet build this test project +/// +/// The tests use CORE_ROOT env var if set, otherwise default to the standard artifacts path. +/// +public class BasicStressTests : CdacStressTestBase +{ + public BasicStressTests(ITestOutputHelper output) : base(output) { } + + public static IEnumerable Debuggees => + [ + ["BasicAlloc"], + ["DeepStack"], + ["Generics"], + ["MultiThread"], + ["Comprehensive"], + ["ExceptionHandling"], + ["StructScenarios"], + ["DynamicMethods"], + ]; + + public static IEnumerable WindowsOnlyDebuggees => + [ + ["PInvoke"], + ]; + + [Theory] + [MemberData(nameof(Debuggees))] + public async Task GCStress_AllVerificationsPass(string debuggeeName) + { + CdacStressResults results = await RunGCStressAsync(debuggeeName); + AssertAllPassed(results, debuggeeName); + } + + [ConditionalTheory] + [MemberData(nameof(WindowsOnlyDebuggees))] + public async Task GCStress_WindowsOnly_AllVerificationsPass(string debuggeeName) + { + if (!RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) + throw new SkipTestException("P/Invoke debuggee uses kernel32.dll (Windows only)"); + + CdacStressResults results = await RunGCStressAsync(debuggeeName); + AssertAllPassed(results, debuggeeName); + } +} diff --git a/src/native/managed/cdac/tests/StressTests/CdacStressResults.cs b/src/native/managed/cdac/tests/StressTests/CdacStressResults.cs new file mode 100644 index 00000000000000..05b55c0918ad96 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/CdacStressResults.cs @@ -0,0 +1,287 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Linq; +using System.Text; +using System.Text.RegularExpressions; + +namespace Microsoft.Diagnostics.DataContractReader.Tests.GCStress; + +/// +/// Parses the cdac stress results log file written by the native cdacstress.cpp hook. +/// +/// +/// Native emission format (see src/coreclr/vm/cdacstress.cpp): +/// +/// [PASS] Thread=... IP=... cDAC=N RT=N frames=N +/// [FAIL] or [KNOWN_ISSUE] Thread=... IP=... cDAC=N RT=N frames=N (match=N mismatch=N known_nie=N), followed by: +/// Frame #N <method> [MISMATCH|KNOWN_NIE] cDAC=N RT=N SP_cDAC=0x... SP_RT=0x... [<-- SP MISMATCH] [(truncated)] +/// [MATCHED|ONLY|NIE(cDAC|RT)] Addr=0x... Obj=0x... Flags=0x... Reg=N Off=N (concise) or with extra HasReg/Reg-name/SP fields (verbose) +/// [STACK_TRACE] (cDAC=N RT=N frames=N) followed by #N <method> (cDAC=N RT=N)[ <-- MISMATCH|KNOWN_NIE ...] +/// Total verifications: N (final summary) +/// +/// +internal sealed partial class CdacStressResults +{ + public int TotalVerifications { get; private set; } + public int Passed { get; private set; } + public int Failed { get; private set; } + public int KnownIssues { get; private set; } + public string LogFilePath { get; private set; } = string.Empty; + public List FailureDetails { get; } = []; + public List FailedVerifications { get; } = []; + + [GeneratedRegex(@"^\[PASS\]")] + private static partial Regex PassPattern(); + + [GeneratedRegex(@"^\[FAIL\]")] + private static partial Regex FailPattern(); + + [GeneratedRegex(@"^\[KNOWN_ISSUE\]")] + private static partial Regex KnownIssuePattern(); + + [GeneratedRegex(@"^Total verifications:\s*(\d+)")] + private static partial Regex TotalPattern(); + + // "Frame #3 SomeMethod [MISMATCH] cDAC=4 RT=3 SP_cDAC=0x7ff... SP_RT=0x7ff... <-- SP MISMATCH (truncated)" + // The method name is non-greedy and bounded by " [MISMATCH]" or " [KNOWN_NIE]" so embedded brackets in the name + // (unlikely but possible for generics) don't confuse the match. + [GeneratedRegex(@"^Frame\s+#(\d+)\s+(.+?)\s+\[(MISMATCH|KNOWN_NIE)\]\s+cDAC=(\d+)\s+RT=(\d+)\s+SP_cDAC=0x([0-9a-fA-F]+)\s+SP_RT=0x([0-9a-fA-F]+)(.*)$")] + private static partial Regex FrameHeaderPattern(); + + // "[MATCHED(cDAC)] Addr=0x... Obj=0x... Flags=0x...[ Reg=N Off=N | HasReg=Y Reg=name(N) Off=N SP=0x...]" + // Concise and verbose lines share the prefix; we only capture what AnalyzeFailures needs. + [GeneratedRegex(@"^\[(MATCHED|ONLY|NIE)\((cDAC|RT)\)\]\s+Addr=0x([0-9a-fA-F]+)\s+Obj=0x([0-9a-fA-F]+)\s+Flags=(\S+)")] + private static partial Regex RefPattern(); + + // "[STACK_TRACE] (cDAC=N RT=N frames=N)" -- section opener; we only use it as a state hint. + [GeneratedRegex(@"^\[STACK_TRACE\]")] + private static partial Regex StackTraceHeaderPattern(); + + // "#N (cDAC=N RT=N)[ <-- MISMATCH | <-- KNOWN_NIE (...)]" + [GeneratedRegex(@"^#\d+\s+.+?\s+\(cDAC=\d+\s+RT=\d+\)")] + private static partial Regex StackTraceLinePattern(); + + public static CdacStressResults Parse(string logFilePath) + { + if (!File.Exists(logFilePath)) + throw new FileNotFoundException($"GC stress results log not found: {logFilePath}"); + + var results = new CdacStressResults { LogFilePath = logFilePath }; + FailedVerification? currentFailure = null; + FrameDiff? currentFrame = null; + bool inStackTrace = false; + + foreach (string line in File.ReadLines(logFilePath)) + { + string trimmed = line.Trim(); + + if (PassPattern().IsMatch(trimmed)) + { + currentFailure = null; + currentFrame = null; + inStackTrace = false; + results.Passed++; + continue; + } + + if (FailPattern().IsMatch(trimmed) || KnownIssuePattern().IsMatch(trimmed)) + { + bool isKnown = KnownIssuePattern().IsMatch(trimmed); + if (isKnown) + results.KnownIssues++; + else + results.Failed++; + results.FailureDetails.Add(trimmed); + currentFailure = new FailedVerification { Header = trimmed, IsKnownIssue = isKnown }; + results.FailedVerifications.Add(currentFailure); + currentFrame = null; + inStackTrace = false; + continue; + } + + Match totalMatch = TotalPattern().Match(trimmed); + if (totalMatch.Success) + { + results.TotalVerifications = int.Parse(totalMatch.Groups[1].Value, CultureInfo.InvariantCulture); + continue; + } + + if (currentFailure is null) + continue; + + if (StackTraceHeaderPattern().IsMatch(trimmed)) + { + inStackTrace = true; + currentFrame = null; + continue; + } + + if (inStackTrace) + { + if (StackTraceLinePattern().IsMatch(trimmed)) + currentFailure.StackTrace.Add(trimmed); + continue; + } + + Match frameMatch = FrameHeaderPattern().Match(trimmed); + if (frameMatch.Success) + { + currentFrame = new FrameDiff + { + Index = int.Parse(frameMatch.Groups[1].Value, CultureInfo.InvariantCulture), + MethodName = frameMatch.Groups[2].Value, + Outcome = frameMatch.Groups[3].Value == "MISMATCH" ? FrameOutcome.Mismatch : FrameOutcome.KnownNie, + CdacCount = int.Parse(frameMatch.Groups[4].Value, CultureInfo.InvariantCulture), + RtCount = int.Parse(frameMatch.Groups[5].Value, CultureInfo.InvariantCulture), + SpCdac = ulong.Parse(frameMatch.Groups[6].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture), + SpRt = ulong.Parse(frameMatch.Groups[7].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture), + SpMismatch = frameMatch.Groups[8].Value.Contains("SP MISMATCH"), + Truncated = frameMatch.Groups[8].Value.Contains("(truncated)"), + }; + currentFailure.FrameDiffs.Add(currentFrame); + continue; + } + + Match refMatch = RefPattern().Match(trimmed); + if (refMatch.Success && currentFrame is not null) + { + var stackRef = new StackRef + { + Disposition = ParseDisposition(refMatch.Groups[1].Value), + Side = refMatch.Groups[2].Value == "cDAC" ? RefSide.Cdac : RefSide.Rt, + Address = ulong.Parse(refMatch.Groups[3].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture), + Object = ulong.Parse(refMatch.Groups[4].Value, NumberStyles.HexNumber, CultureInfo.InvariantCulture), + Flags = refMatch.Groups[5].Value, + }; + currentFrame.Refs.Add(stackRef); + } + } + + if (results.TotalVerifications == 0) + { + results.TotalVerifications = results.Passed + results.Failed + results.KnownIssues; + } + + return results; + } + + private static RefDisposition ParseDisposition(string value) => value switch + { + "MATCHED" => RefDisposition.Matched, + "ONLY" => RefDisposition.Only, + "NIE" => RefDisposition.Nie, + _ => RefDisposition.Unknown, + }; + + public override string ToString() => + $"Total={TotalVerifications}, Passed={Passed}, Failed={Failed}, KnownIssues={KnownIssues}"; + + /// + /// Formats the first N failed verifications using the structured per-frame data + /// logged by the native code. No re-analysis needed -- just presents what was logged. + /// + public string AnalyzeFailures(int maxFailures = 3) + { + var sb = new StringBuilder(); + + foreach (FailedVerification failure in FailedVerifications.Take(maxFailures)) + { + sb.AppendLine(failure.Header); + + foreach (FrameDiff frame in failure.FrameDiffs) + { + string outcomeLabel = frame.Outcome == FrameOutcome.Mismatch ? "MISMATCH" : "KNOWN_NIE"; + string suffix = (frame.SpMismatch ? " <-- SP MISMATCH" : string.Empty) + + (frame.Truncated ? " (truncated)" : string.Empty); + sb.AppendLine( + $" Frame #{frame.Index} {frame.MethodName} [{outcomeLabel}] cDAC={frame.CdacCount} RT={frame.RtCount}" + + $" SP_cDAC=0x{frame.SpCdac:X} SP_RT=0x{frame.SpRt:X}{suffix}"); + + // Only divergent refs (ONLY / NIE) are interesting in the summary. MATCHED refs + // are logged in verbose mode only and would just add noise here. + foreach (StackRef r in frame.Refs.Where(static x => x.Disposition != RefDisposition.Matched)) + { + sb.AppendLine( + $" [{DispositionLabel(r.Disposition)}({SideLabel(r.Side)})] " + + $"Addr=0x{r.Address:X} Obj=0x{r.Object:X} Flags={r.Flags}"); + } + } + + if (failure.StackTrace.Count > 0) + { + sb.AppendLine(" Stack trace:"); + foreach (string frame in failure.StackTrace) + sb.AppendLine($" {frame}"); + } + + sb.AppendLine(); + } + + return sb.ToString(); + } + + private static string DispositionLabel(RefDisposition d) => d switch + { + RefDisposition.Matched => "MATCHED", + RefDisposition.Only => "ONLY", + RefDisposition.Nie => "NIE", + _ => "?", + }; + + private static string SideLabel(RefSide s) => s == RefSide.Cdac ? "cDAC" : "RT"; +} + +internal enum RefDisposition +{ + Unknown, + Matched, + Only, + Nie, +} + +internal enum RefSide +{ + Cdac, + Rt, +} + +internal enum FrameOutcome +{ + Mismatch, + KnownNie, +} + +internal struct StackRef +{ + public RefDisposition Disposition; + public RefSide Side; + public ulong Address; + public ulong Object; + public string Flags; +} + +internal sealed class FrameDiff +{ + public int Index { get; set; } + public string MethodName { get; set; } = string.Empty; + public FrameOutcome Outcome { get; set; } + public int CdacCount { get; set; } + public int RtCount { get; set; } + public ulong SpCdac { get; set; } + public ulong SpRt { get; set; } + public bool SpMismatch { get; set; } + public bool Truncated { get; set; } + public List Refs { get; } = []; +} + +internal sealed class FailedVerification +{ + public string Header { get; set; } = string.Empty; + public bool IsKnownIssue { get; set; } + public List FrameDiffs { get; } = []; + public List StackTrace { get; } = []; +} diff --git a/src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs b/src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs new file mode 100644 index 00000000000000..4e617f1c2d1589 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs @@ -0,0 +1,210 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Diagnostics; +using System.IO; +using System.Linq; +using System.Runtime.InteropServices; +using System.Threading; +using System.Threading.Tasks; +using Xunit; +using Xunit.Abstractions; + +namespace Microsoft.Diagnostics.DataContractReader.Tests.GCStress; + +/// +/// Base class for cDAC stress tests. Runs a debuggee app under corerun +/// with DOTNET_CdacStress=0x001 (ALLOC) and parses the verification results. +/// +public abstract class CdacStressTestBase +{ + private readonly ITestOutputHelper _output; + + protected CdacStressTestBase(ITestOutputHelper output) + { + _output = output; + } + + /// + /// Runs the named debuggee under GC stress and returns the parsed results. + /// + internal async Task RunGCStressAsync(string debuggeeName, int timeoutSeconds = 300) + { + string coreRoot = GetCoreRoot(); + string corerun = Path.Combine(coreRoot, OperatingSystem.IsWindows() ? "corerun.exe" : "corerun"); + Assert.True(File.Exists(corerun), $"corerun not found at '{corerun}'"); + + string debuggeeDll = GetDebuggeePath(debuggeeName); + // When running on Helix, write logs into HELIX_WORKITEM_UPLOAD_ROOT so + // they're uploaded as work-item artifacts and visible via the Helix API. + // Locally, fall back to the system temp directory. + string logDir = Environment.GetEnvironmentVariable("HELIX_WORKITEM_UPLOAD_ROOT") + ?? Path.GetTempPath(); + string logFile = Path.Combine(logDir, $"cdac-gcstress-{debuggeeName}-{Guid.NewGuid():N}.txt"); + + _output.WriteLine($"Running GC stress: {debuggeeName}"); + _output.WriteLine($" corerun: {corerun}"); + _output.WriteLine($" debuggee: {debuggeeDll}"); + _output.WriteLine($" log: {logFile}"); + + var psi = new ProcessStartInfo + { + FileName = corerun, + Arguments = $"\"{debuggeeDll}\"", + UseShellExecute = false, + RedirectStandardOutput = true, + RedirectStandardError = true, + }; + psi.Environment["CORE_ROOT"] = coreRoot; + // Verifies every stress hit. We rely on the debuggee's own iteration + // count to keep test time bounded. + psi.Environment["DOTNET_CdacStress"] = "0x001"; + psi.Environment["DOTNET_CdacStressFailFast"] = "0"; + psi.Environment["DOTNET_CdacStressLogFile"] = logFile; + psi.Environment["DOTNET_ContinueOnAssert"] = "1"; + + using var process = Process.Start(psi)!; + + // Drain stdout/stderr concurrently with WaitForExit so pipe buffers can't + // deadlock, and so a timeout cancels all three waits via one CTS. + // ReadToEndAsync returns only after the pipe is closed at process exit, + // so we can't lose trailing output the way BeginOutputReadLine + a manual + // drain can. + using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds)); + Task stdoutTask = process.StandardOutput.ReadToEndAsync(cts.Token); + Task stderrTask = process.StandardError.ReadToEndAsync(cts.Token); + + try + { + await process.WaitForExitAsync(cts.Token); + } + catch (OperationCanceledException) + { + process.Kill(entireProcessTree: true); + Assert.Fail($"GC stress test '{debuggeeName}' timed out after {timeoutSeconds}s"); + throw; + } + + string stdout = await stdoutTask; + string stderr = await stderrTask; + + _output.WriteLine($" exit code: {process.ExitCode}"); + if (!string.IsNullOrWhiteSpace(stdout)) + _output.WriteLine($" stdout: {stdout.TrimEnd()}"); + if (!string.IsNullOrWhiteSpace(stderr)) + _output.WriteLine($" stderr: {stderr.TrimEnd()}"); + + Assert.True(process.ExitCode == 100, + $"GC stress test '{debuggeeName}' exited with {process.ExitCode} (expected 100).\nstdout: {stdout}\nstderr: {stderr}"); + + Assert.True(File.Exists(logFile), + $"GC stress results log not created: {logFile}\n" + + $" This usually means the cDAC stress framework failed to initialize\n" + + $" (e.g. could not load mscordaccore_universal, log directory missing,\n" + + $" or DOTNET_CdacStress not honored).\n" + + $"stdout: {stdout}\nstderr: {stderr}"); + + CdacStressResults results = CdacStressResults.Parse(logFile); + + _output.WriteLine($" results: {results}"); + + return results; + } + + /// + /// Asserts the GC stress run produced at least one verification and had no + /// hard failures. is intentionally + /// tolerated (the native harness emits [KNOWN_ISSUE] for acknowledged + /// divergences via s_knownIssueCount, separate from + /// s_failCount) but is logged so regressions in the known-issue + /// count are visible during triage. + /// + internal static void AssertAllPassed(CdacStressResults results, string debuggeeName) + { + Assert.True(results.TotalVerifications > 0, + $"GC stress test '{debuggeeName}' produced zero verifications — " + + "the cDAC stress framework may not be enabled (DOTNET_CdacStress unset, " + + "or coreclr built without CDAC_STRESS)."); + + if (results.Failed > 0) + { + string analysis = results.AnalyzeFailures(maxFailures: 3); + Assert.Fail( + $"GC stress test '{debuggeeName}' had {results.Failed} failure(s) " + + $"out of {results.TotalVerifications} verifications " + + $"({results.KnownIssues} known issue(s) tolerated).\n" + + $"Log: {results.LogFilePath}\n\n{analysis}"); + } + } + + private static string GetCoreRoot() + { + // Explicit override wins (typical when running locally with a custom layout). + string? coreRoot = Environment.GetEnvironmentVariable("CORE_ROOT"); + if (!string.IsNullOrEmpty(coreRoot) && Directory.Exists(coreRoot)) + return coreRoot; + + // Helix layout: testhost is unpacked under HELIX_CORRELATION_PAYLOAD and + // corerun lives in shared/Microsoft.NETCore.App//. Pick the + // first version directory; the payload should contain exactly one. + string? helixPayload = Environment.GetEnvironmentVariable("HELIX_CORRELATION_PAYLOAD"); + if (!string.IsNullOrEmpty(helixPayload)) + { + string frameworkRoot = Path.Combine(helixPayload, "shared", "Microsoft.NETCore.App"); + if (Directory.Exists(frameworkRoot)) + { + string? versionDir = Directory.EnumerateDirectories(frameworkRoot).FirstOrDefault(); + if (versionDir is not null) + return versionDir; + } + } + + // Local fallback: derive from the repo's standard artifact layout. + string os = OperatingSystem.IsWindows() ? "windows" : OperatingSystem.IsMacOS() ? "osx" : "linux"; + string arch = RuntimeInformation.ProcessArchitecture.ToString().ToLowerInvariant(); + coreRoot = Path.Combine(FindRepoRoot(), "artifacts", "tests", "coreclr", $"{os}.{arch}.Checked", "Tests", "Core_Root"); + + if (!Directory.Exists(coreRoot)) + throw new DirectoryNotFoundException( + $"Core_Root not found at '{coreRoot}'. " + + "Set the CORE_ROOT environment variable or run 'src/tests/build.cmd Checked generatelayoutonly'."); + + return coreRoot; + } + + private static string GetDebuggeePath(string debuggeeName) + { + // On Helix, the work-item payload places debuggees as siblings of the + // test assembly at /debuggees//. Locally they're under + // artifacts/bin/StressTests////. + string? helixPayload = Environment.GetEnvironmentVariable("HELIX_WORKITEM_PAYLOAD"); + string root = !string.IsNullOrEmpty(helixPayload) + ? Path.Combine(helixPayload, "debuggees", debuggeeName) + : Path.Combine(FindRepoRoot(), "artifacts", "bin", "StressTests", debuggeeName); + + if (!Directory.Exists(root)) + throw new DirectoryNotFoundException( + $"Debuggee '{debuggeeName}' not found at '{root}'. Build the StressTests project first."); + + string dllName = $"{debuggeeName}.dll"; + string? dll = Directory.EnumerateFiles(root, dllName, SearchOption.AllDirectories).FirstOrDefault(); + if (dll is null) + throw new FileNotFoundException($"Could not find {dllName} under '{root}'"); + + return dll; + } + + private static string FindRepoRoot() + { + string? dir = AppContext.BaseDirectory; + while (dir is not null) + { + if (File.Exists(Path.Combine(dir, "global.json"))) + return dir; + dir = Path.GetDirectoryName(dir); + } + + throw new InvalidOperationException("Could not find repo root (global.json)"); + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/DeepStack.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/DeepStack.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/DeepStack.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/Program.cs new file mode 100644 index 00000000000000..c98679aea54ac2 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/DeepStack/Program.cs @@ -0,0 +1,43 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Runtime.CompilerServices; + +/// +/// Exercises deep recursion with live GC references at each frame level. +/// +internal static class Program +{ + [MethodImpl(MethodImplOptions.NoInlining)] + static void NestedCall(int depth) + { + object o = new object(); + if (depth > 0) + NestedCall(depth - 1); + GC.KeepAlive(o); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void NestedWithMultipleRefs(int depth) + { + object a = new object(); + string b = $"depth-{depth}"; + int[] c = new int[depth + 1]; + if (depth > 0) + NestedWithMultipleRefs(depth - 1); + GC.KeepAlive(a); + GC.KeepAlive(b); + GC.KeepAlive(c); + } + + static int Main() + { + for (int i = 0; i < 2; i++) + { + NestedCall(10); + NestedWithMultipleRefs(8); + } + return 100; + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/DynamicMethods.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/DynamicMethods.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/DynamicMethods.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/Program.cs new file mode 100644 index 00000000000000..865d338e1e935f --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/DynamicMethods/Program.cs @@ -0,0 +1,149 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Reflection; +using System.Reflection.Emit; +using System.Runtime.CompilerServices; + +/// +/// Exercises the MetaSig (non-GCRefMap) path by creating and invoking +/// DynamicMethod (LCG) methods. These methods use StoredSigMethodDesc +/// and don't have pre-computed GCRefMaps, forcing PromoteCallerStack +/// to walk the signature via MetaSig. +/// +/// Scenarios: +/// - Simple object parameter (GcTypeKind.Ref) +/// - Multiple object parameters +/// - Byref parameter (GcTypeKind.Interior) +/// - Mixed ref and primitive parameters +/// - Method with 'this' (instance delegate) +/// - Method returning object (tests return type parsing) +/// +internal static class Program +{ + static int Main() + { + for (int i = 0; i < 50; i++) + { + SimpleObjectParam(); + MultipleObjectParams(); + MixedParams(); + ObjectReturn(); + KeepAliveInDynamic(); + } + return 100; + } + + // ===== Scenario 1: Single object parameter ===== + [MethodImpl(MethodImplOptions.NoInlining)] + static void SimpleObjectParam() + { + // Create: void DynMethod(object o) + DynamicMethod dm = new("DynSimple", typeof(void), new[] { typeof(object) }); + ILGenerator il = dm.GetILGenerator(); + il.Emit(OpCodes.Ldarg_0); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ret); + + Action del = dm.CreateDelegate>(); + object live = new object(); + del(live); + GC.KeepAlive(live); + } + + // ===== Scenario 2: Multiple object parameters ===== + [MethodImpl(MethodImplOptions.NoInlining)] + static void MultipleObjectParams() + { + // Create: void DynMulti(object a, string b, int[] c) + DynamicMethod dm = new("DynMulti", typeof(void), + new[] { typeof(object), typeof(string), typeof(int[]) }); + ILGenerator il = dm.GetILGenerator(); + il.Emit(OpCodes.Ldarg_0); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_1); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_2); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ret); + + var del = dm.CreateDelegate>(); + object a = new object(); + string b = "hello"; + int[] c = new[] { 1, 2, 3 }; + del(a, b, c); + GC.KeepAlive(a); + GC.KeepAlive(b); + GC.KeepAlive(c); + } + + // ===== Scenario 3: Mixed ref and primitive parameters ===== + [MethodImpl(MethodImplOptions.NoInlining)] + static void MixedParams() + { + // Create: void DynMixed(object o, int x, string s, long y) + DynamicMethod dm = new("DynMixed", typeof(void), + new[] { typeof(object), typeof(int), typeof(string), typeof(long) }); + ILGenerator il = dm.GetILGenerator(); + il.Emit(OpCodes.Ldarg_0); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_2); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ret); + + var del = dm.CreateDelegate>(); + object o = new object(); + string s = "world"; + del(o, 42, s, 999L); + GC.KeepAlive(o); + GC.KeepAlive(s); + } + + // ===== Scenario 4: Object return type ===== + [MethodImpl(MethodImplOptions.NoInlining)] + static void ObjectReturn() + { + // Create: object DynReturn(object o) + DynamicMethod dm = new("DynReturn", typeof(object), new[] { typeof(object) }); + ILGenerator il = dm.GetILGenerator(); + il.Emit(OpCodes.Ldarg_0); + il.Emit(OpCodes.Ret); + + var del = dm.CreateDelegate>(); + object input = new object(); + object result = del(input); + GC.KeepAlive(result); + GC.KeepAlive(input); + } + + // ===== Scenario 5: Multiple allocations inside dynamic method ===== + [MethodImpl(MethodImplOptions.NoInlining)] + static void KeepAliveInDynamic() + { + // Create: void DynAlloc(object a, object b, object c, object d) + DynamicMethod dm = new("DynAlloc", typeof(void), + new[] { typeof(object), typeof(object), typeof(object), typeof(object) }); + ILGenerator il = dm.GetILGenerator(); + il.Emit(OpCodes.Ldarg_0); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_1); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_2); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ldarg_3); + il.Emit(OpCodes.Call, typeof(GC).GetMethod(nameof(GC.KeepAlive))!); + il.Emit(OpCodes.Ret); + + var del = dm.CreateDelegate>(); + object a = new object(); + object b = "str"; + object c = new int[] { 1 }; + object d = new byte[16]; + del(a, b, c, d); + GC.KeepAlive(a); + GC.KeepAlive(b); + GC.KeepAlive(c); + GC.KeepAlive(d); + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/ExceptionHandling.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/ExceptionHandling.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/ExceptionHandling.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/Program.cs new file mode 100644 index 00000000000000..4bd0a12fe6d145 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/ExceptionHandling/Program.cs @@ -0,0 +1,143 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Runtime.CompilerServices; + +/// +/// Exercises exception handling: try/catch/finally funclets, nested exceptions, +/// filter funclets, and rethrow. +/// +internal static class Program +{ + [MethodImpl(MethodImplOptions.NoInlining)] + static void TryCatchScenario() + { + object before = new object(); + try + { + object inside = new object(); + ThrowHelper(); + GC.KeepAlive(inside); + } + catch (InvalidOperationException ex) + { + object inCatch = new object(); + GC.KeepAlive(ex); + GC.KeepAlive(inCatch); + } + GC.KeepAlive(before); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void ThrowHelper() + { + throw new InvalidOperationException("test exception"); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void TryFinallyScenario() + { + object outerRef = new object(); + try + { + object innerRef = new object(); + GC.KeepAlive(innerRef); + } + finally + { + object finallyRef = new object(); + GC.KeepAlive(finallyRef); + } + GC.KeepAlive(outerRef); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void NestedExceptionScenario() + { + object a = new object(); + try + { + try + { + object c = new object(); + throw new ArgumentException("inner"); + } + catch (ArgumentException ex1) + { + GC.KeepAlive(ex1); + throw new InvalidOperationException("outer", ex1); + } + finally + { + object d = new object(); + GC.KeepAlive(d); + } + } + catch (InvalidOperationException ex2) + { + GC.KeepAlive(ex2); + } + GC.KeepAlive(a); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void FilterExceptionScenario() + { + object holder = new object(); + try + { + throw new ArgumentException("filter-test"); + } + catch (ArgumentException ex) when (FilterCheck(ex)) + { + GC.KeepAlive(ex); + } + GC.KeepAlive(holder); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static bool FilterCheck(Exception ex) + { + object filterLocal = new object(); + GC.KeepAlive(filterLocal); + return ex.Message.Contains("filter"); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void RethrowScenario() + { + object outerRef = new object(); + try + { + try + { + throw new ApplicationException("rethrow-test"); + } + catch (ApplicationException) + { + object catchRef = new object(); + GC.KeepAlive(catchRef); + throw; + } + } + catch (ApplicationException ex) + { + GC.KeepAlive(ex); + } + GC.KeepAlive(outerRef); + } + + static int Main() + { + for (int i = 0; i < 2; i++) + { + TryCatchScenario(); + TryFinallyScenario(); + NestedExceptionScenario(); + FilterExceptionScenario(); + RethrowScenario(); + } + return 100; + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Generics.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Generics.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Generics.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Program.cs new file mode 100644 index 00000000000000..54b7060c040f5a --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/Generics/Program.cs @@ -0,0 +1,81 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Collections.Generic; +using System.Runtime.CompilerServices; + +/// +/// Exercises generic method instantiations and interface dispatch. +/// +internal static class Program +{ + interface IKeepAlive + { + object GetRef(); + } + + class BoxHolder : IKeepAlive + { + object _value; + public BoxHolder() { _value = new object(); } + public BoxHolder(object v) { _value = v; } + + [MethodImpl(MethodImplOptions.NoInlining)] + public object GetRef() => _value; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static T GenericAlloc() where T : new() + { + T val = new T(); + object marker = new object(); + GC.KeepAlive(marker); + return val; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void GenericScenario() + { + var o = GenericAlloc(); + var l = GenericAlloc>(); + var s = GenericAlloc(); + GC.KeepAlive(o); + GC.KeepAlive(l); + GC.KeepAlive(s); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void InterfaceDispatchScenario() + { + IKeepAlive holder = new BoxHolder(new int[] { 42, 43 }); + object r = holder.GetRef(); + GC.KeepAlive(holder); + GC.KeepAlive(r); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void DelegateScenario() + { + object captured = new object(); + Func fn = () => + { + GC.KeepAlive(captured); + return new object(); + }; + object result = fn(); + GC.KeepAlive(result); + GC.KeepAlive(fn); + } + + static int Main() + { + for (int i = 0; i < 2; i++) + { + GenericScenario(); + InterfaceDispatchScenario(); + DelegateScenario(); + } + return 100; + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/MultiThread.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/MultiThread.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/MultiThread.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/Program.cs new file mode 100644 index 00000000000000..0eea731a6bd313 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/MultiThread/Program.cs @@ -0,0 +1,53 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Runtime.CompilerServices; +using System.Threading; + +/// +/// Exercises concurrent threads with GC references, exercising multi-threaded +/// stack walks and GC ref enumeration. +/// +internal static class Program +{ + [MethodImpl(MethodImplOptions.NoInlining)] + static void NestedCall(int depth) + { + object o = new object(); + if (depth > 0) + NestedCall(depth - 1); + GC.KeepAlive(o); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void ThreadWork(int id) + { + object threadLocal = new object(); + string threadName = $"thread-{id}"; + NestedCall(5); + GC.KeepAlive(threadLocal); + GC.KeepAlive(threadName); + } + + static int Main() + { + for (int iteration = 0; iteration < 2; iteration++) + { + ManualResetEventSlim ready = new ManualResetEventSlim(false); + ManualResetEventSlim go = new ManualResetEventSlim(false); + Thread t = new Thread(() => + { + ready.Set(); + go.Wait(); + ThreadWork(1); + }); + t.Start(); + ready.Wait(); + go.Set(); + ThreadWork(0); + t.Join(); + } + return 100; + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/PInvoke.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/PInvoke.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/PInvoke.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/Program.cs new file mode 100644 index 00000000000000..83aece921baaea --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/PInvoke/Program.cs @@ -0,0 +1,74 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Runtime.CompilerServices; +using System.Runtime.InteropServices; + +/// +/// Exercises P/Invoke transitions with GC references before and after native calls, +/// and pinned GC handles. +/// +internal static class Program +{ + [DllImport("kernel32.dll")] + static extern uint GetCurrentThreadId(); + + [MethodImpl(MethodImplOptions.NoInlining)] + static void PInvokeScenario() + { + object before = new object(); + uint tid = GetCurrentThreadId(); + object after = new object(); + GC.KeepAlive(before); + GC.KeepAlive(after); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void PinnedScenario() + { + byte[] buffer = new byte[64]; + GCHandle pin = GCHandle.Alloc(buffer, GCHandleType.Pinned); + try + { + object other = new object(); + GC.KeepAlive(other); + GC.KeepAlive(buffer); + } + finally + { + pin.Free(); + } + } + + struct LargeStruct + { + public object A, B, C, D; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void StructWithRefsScenario() + { + LargeStruct ls; + ls.A = new object(); + ls.B = "struct-string"; + ls.C = new int[] { 10, 20 }; + ls.D = new object(); + GC.KeepAlive(ls.A); + GC.KeepAlive(ls.B); + GC.KeepAlive(ls.C); + GC.KeepAlive(ls.D); + } + + static int Main() + { + for (int i = 0; i < 2; i++) + { + if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) + PInvokeScenario(); + PinnedScenario(); + StructWithRefsScenario(); + } + return 100; + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/Program.cs b/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/Program.cs new file mode 100644 index 00000000000000..9067337495def2 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/Program.cs @@ -0,0 +1,157 @@ +// Licensed to the .NET Foundation under one or more agreements. +// The .NET Foundation licenses this file to you under the MIT license. + +using System; +using System.Runtime.CompilerServices; + +/// +/// Exercises struct-related GC scanning scenarios that stress the MetaSig path: +/// - Value type 'this' (interior pointer for struct instance methods) +/// - Small struct returns (retbuf detection precision) +/// - Struct parameters containing embedded GC references +/// +internal static class Program +{ + static int Main() + { + for (int i = 0; i < 100; i++) + { + ValueTypeThisScenario(); + SmallStructReturnScenario(); + StructWithRefsScenario(); + InterfaceDispatchScenario(); + } + return 100; + } + + // ===== Scenario 1: Value type 'this' ===== + // When a struct instance method is called through interface dispatch, + // 'this' is an interior pointer (pointing into the boxed struct, past + // the MethodTable pointer). The GC needs GC_CALL_INTERIOR to handle it. + + interface IKeepAlive + { + object GetRef(); + } + + struct StructWithRef : IKeepAlive + { + public object Field; + + [MethodImpl(MethodImplOptions.NoInlining)] + public object GetRef() => Field; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void ValueTypeThisScenario() + { + IKeepAlive s = new StructWithRef { Field = new object() }; + object r = s.GetRef(); + GC.KeepAlive(r); + GC.KeepAlive(s); + } + + // ===== Scenario 2: Small struct returns ===== + // Methods returning small structs (1/2/4/8 bytes, power-of-2) do NOT need + // a return buffer on AMD64 Windows — the value is returned in RAX. + // Conservative HasRetBuffArg=true shifts all parameter offsets by 1 slot. + + struct SmallResult + { + public int Value; + } + + struct TinyResult + { + public byte Value; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static SmallResult MakeSmallResult(object keepAlive) + { + GC.KeepAlive(keepAlive); + return new SmallResult { Value = 42 }; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static TinyResult MakeTinyResult(object keepAlive) + { + GC.KeepAlive(keepAlive); + return new TinyResult { Value = 1 }; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void SmallStructReturnScenario() + { + object live = new object(); + SmallResult sr = MakeSmallResult(live); + TinyResult tr = MakeTinyResult(live); + GC.KeepAlive(sr); + GC.KeepAlive(tr); + GC.KeepAlive(live); + } + + // ===== Scenario 3: Struct parameters with embedded GC refs ===== + // Value type parameters containing object references require GCDesc + // scanning to find the embedded refs. Without this, the refs inside + // the struct are invisible to the GC. + + struct Holder + { + public object Ref1; + public string Ref2; + public int[] Array; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void ProcessHolder(Holder h) + { + GC.KeepAlive(h.Ref1); + GC.KeepAlive(h.Ref2); + GC.KeepAlive(h.Array); + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void StructWithRefsScenario() + { + Holder h = new Holder + { + Ref1 = new object(), + Ref2 = "hello", + Array = new int[] { 1, 2, 3 }, + }; + ProcessHolder(h); + GC.KeepAlive(h.Ref1); + } + + // ===== Scenario 4: Interface dispatch with generics ===== + // Shared generic methods going through stub dispatch combine + // RequiresInstArg with value type 'this'. + + interface IGenericOp + { + T Get(); + } + + struct GenericStruct : IGenericOp + { + public T Value; + + [MethodImpl(MethodImplOptions.NoInlining)] + public T Get() => Value; + } + + [MethodImpl(MethodImplOptions.NoInlining)] + static void InterfaceDispatchScenario() + { + IGenericOp g = new GenericStruct { Value = new object() }; + object r = g.Get(); + GC.KeepAlive(r); + GC.KeepAlive(g); + + IGenericOp gs = new GenericStruct { Value = "test" }; + string s = gs.Get(); + GC.KeepAlive(s); + GC.KeepAlive(gs); + } +} diff --git a/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/StructScenarios.csproj b/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/StructScenarios.csproj new file mode 100644 index 00000000000000..6b512ec9245ec3 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/StructScenarios.csproj @@ -0,0 +1 @@ + diff --git a/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj b/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj new file mode 100644 index 00000000000000..d6bd3aa5a13459 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/Microsoft.Diagnostics.DataContractReader.StressTests.csproj @@ -0,0 +1,21 @@ + + + true + $(NetCoreAppToolCurrent) + enable + true + + + + + + + + + + + + + + + diff --git a/src/native/managed/cdac/tests/StressTests/README.md b/src/native/managed/cdac/tests/StressTests/README.md index c5bcde5675b3f0..50aae0aa6158ba 100644 --- a/src/native/managed/cdac/tests/StressTests/README.md +++ b/src/native/managed/cdac/tests/StressTests/README.md @@ -1,108 +1,151 @@ # cDAC Stress Tests -This folder contains stress tests that verify the cDAC's stack reference -enumeration against the runtime's GC root scanning. The tests run managed -debuggee applications under `corerun` with cDAC stress flags enabled, -triggering verification at allocation points, GC points, or instruction-level -GC stress points. +Integration tests that verify the cDAC's stack reference enumeration matches the runtime's +GC root scanning under GC stress conditions. -## Quick Start +See [known-issues.md](known-issues.md) for the current pass/fail matrix and a catalog of +investigated gaps. -```powershell -# Prerequisites: build CoreCLR Checked and generate core_root -# build.cmd clr+libs -rc Checked -lc Release -# src\tests\build.cmd Checked generatelayoutonly /p:LibrariesConfiguration=Release +## Quickstart (Windows x64) -# Run all debuggees (allocation-point verification, no GCStress) -./RunStressTests.ps1 -SkipBuild +From the repo root: -# Run a single debuggee -./RunStressTests.ps1 -SkipBuild -Debuggee BasicAlloc +```powershell +# 1. Build CoreCLR + cDAC (Checked is recommended; Debug also works) +.\build.cmd -subset clr.native+tools.cdac -c Checked -rc Checked -lc Release -# Run with instruction-level GCStress (slower, more thorough) -./RunStressTests.ps1 -SkipBuild -CdacStress 0x14 -GCStress 0x4 +# 2. Generate the Core_Root layout the debuggees run against +.\src\tests\build.cmd Checked generatelayoutonly -SkipRestorePackages /p:LibrariesConfiguration=Release -# Full comparison including walk parity and DAC cross-check -./RunStressTests.ps1 -SkipBuild -CdacStress 0x74 -GCStress 0x4 +# 3. Run the stress suite (debuggees auto-built; allocation-point verification) +.\src\native\managed\cdac\tests\StressTests\RunStressTests.ps1 -SkipBuild -Configuration Checked ``` +Equivalent on Linux/macOS: replace `.cmd` with `.sh` and invoke `pwsh ./RunStressTests.ps1 ...`. + ## How It Works -### DOTNET_CdacStress Flags +Each test runs a debuggee console app under `corerun` with `DOTNET_CdacStress` set, which +turns on hooks in `src/coreclr/vm/cdacstress.cpp`. The native hook: + +1. Walks the stack at each managed allocation (the only trigger point currently wired — + `gchelpers.cpp` call sites; the historical `gccover.cpp` instruction-level hooks + have been removed). +2. Compares cDAC's `GetStackReferences` output against the runtime's own GC root + enumeration (the single oracle). +3. Writes structured per-frame results (with resolved method names) to + `DOTNET_CdacStressLogFile`. + +### `DOTNET_CdacStress` flag layout -The `DOTNET_CdacStress` environment variable is a bitmask that controls -**where** and **what** the runtime verifies: +One trigger point is wired today: allocation (`gchelpers.cpp`). This is +unrelated to `DOTNET_GCStress` (the JIT instruction stress feature). -| Bit | Flag | Description | -|-----|------|-------------| -| 0x1 | ALLOC | Verify at managed allocation points | -| 0x2 | GC | Verify at GC collection points | -| 0x4 | INSTR | Verify at instruction-level GC stress points (requires `DOTNET_GCStress`) | -| 0x10 | REFS | Compare GC stack references (cDAC vs runtime) | -| 0x20 | WALK | Compare stack walk frame ordering (cDAC vs DAC) | -| 0x40 | USE_DAC | Also compare GC refs against the legacy DAC | -| 0x100 | UNIQUE | Only verify each instruction pointer once | +| Bits | Name | Meaning | +|----------|-----------|-----------------------------------------------------------------| +| `0x001` | ALLOC | Verify at every managed allocation | +| `0x200` | VERBOSE | Rich per-ref diagnostics in the log | Common combinations: -- `0x11` — ALLOC + REFS (fast, default) -- `0x14` — INSTR + REFS (thorough, requires `DOTNET_GCStress=0x4`) -- `0x31` — ALLOC + REFS + WALK (fast with walk parity check) -- `0x74` — INSTR + REFS + WALK + USE_DAC (full comparison) +- `0x001` — ALLOC (default for `RunStressTests.ps1` and the xUnit tests) +- `0x201` — ALLOC + VERBOSE (use when triaging a mismatch) -### Verification Flow +### Pass/fail semantics in the log -At each stress point, the native hook (`cdacstress.cpp`) in the runtime: +- **[PASS]** — cDAC matches the runtime +- **[KNOWN_ISSUE]** — cDAC differs, but every diff is on a Frame the cDAC explicitly + marked as deferred (e.g. `PromoteCallerStack` not yet ported for that transition type) +- **[FAIL]** — cDAC differs from the runtime on a Frame that *should* be implemented, + or cDAC's `GetStackReferences` failed at the API boundary -1. Suspends the current thread's context -2. Calls the cDAC's `GetStackReferences` to enumerate GC roots -3. Compares against the runtime's own GC root enumeration -4. Optionally compares against the legacy DAC's enumeration -5. Optionally compares stack walk frame ordering -6. Logs `[PASS]` or `[FAIL]` per verification point +See [known-issues.md § Log Format](known-issues.md#log-format) for the per-frame log shape. -The script collects these results and reports aggregate pass/fail counts. +## Running Tests -## Debuggees +### Using `RunStressTests.ps1` (recommended for local dev) -Each debuggee is a standalone console application under `Debuggees/`: +```powershell +# Run all debuggees (allocation-point verification, no GCStress) +.\RunStressTests.ps1 -SkipBuild -Configuration Checked -| Debuggee | Scenarios | -|----------|-----------| -| **BasicAlloc** | Object allocation, strings, arrays, many live refs | -| **Comprehensive** | All-in-one: allocations, deep stacks, exceptions, generics, P/Invoke, threading | +# Run a single debuggee +.\RunStressTests.ps1 -SkipBuild -Configuration Checked -Debuggee BasicAlloc + +# Run with verbose per-ref diagnostics (use when triaging a mismatch) +.\RunStressTests.ps1 -SkipBuild -Configuration Checked -CdacStress 0x201 +``` + +Logs land under +`artifacts\tests\coreclr\..\Tests\cdacstresslogs\.log`. -All debuggees return exit code 100 on success. +### Using `dotnet test` (xUnit harness — same path CI runs) -### Adding a New Debuggee +The xUnit harness defaults to `DOTNET_CdacStress=0x001` (ALLOC). + +```powershell +# Build and run all stress tests +.\.dotnet\dotnet.exe test src\native\managed\cdac\tests\StressTests -1. Create a new folder under `Debuggees/` (e.g., `Debuggees/MyScenario/`) -2. Add a minimal `.csproj`: - ```xml - - ``` - The `Directory.Build.props` provides all common settings. -3. Add a `Program.cs` with a `Main()` that returns 100 -4. Use `[MethodImpl(MethodImplOptions.NoInlining)]` and `GC.KeepAlive()` - to prevent the JIT from optimizing away allocations and references +# Run a specific debuggee +.\.dotnet\dotnet.exe test src\native\managed\cdac\tests\StressTests --filter "FullyQualifiedName~BasicAlloc" + +# Override CdacStress flags for a single run (e.g. enable verbose diagnostics) +$env:DOTNET_CdacStress = "0x201" +.\.dotnet\dotnet.exe test src\native\managed\cdac\tests\StressTests + +# Point at an existing Core_Root explicitly +$env:CORE_ROOT = "path\to\Core_Root" +.\.dotnet\dotnet.exe test src\native\managed\cdac\tests\StressTests +``` -The script auto-discovers all debuggees by scanning for `.csproj` files. +## Triaging Failures -## Script Parameters +1. Open the per-debuggee log (`.log`). +2. Search for `^\[FAIL\]` to find failing verifications. +3. Each failure prints `[STACK_TRACE]` with `cDAC=X RT=Y` per frame; the `[<-- MISMATCH]` + marker pinpoints the offending frame. +4. Cross-check against [known-issues.md](known-issues.md) — the gap may already be tracked. +5. To reproduce in a debugger, rerun the single debuggee under `corerun` with the same + `DOTNET_CdacStress` value and attach. -| Parameter | Default | Description | -|-----------|---------|-------------| -| `-Configuration` | `Checked` | Runtime build configuration | -| `-CdacStress` | `0x11` | Hex bitmask for `DOTNET_CdacStress` | -| `-GCStress` | _(empty)_ | Hex value for `DOTNET_GCStress` (e.g., `0x4`) | -| `-Debuggee` | _(all)_ | Which debuggee(s) to run | -| `-SkipBuild` | off | Skip CoreCLR/cDAC build step | -| `-SkipBaseline` | off | Skip baseline (no-stress) verification | +## Adding a New Debuggee -## Expected Results +1. Create a folder under `Debuggees/` with a `.csproj` and `Program.cs` +2. The `.csproj` just needs: `` + (inherits OutputType=Exe and TFM from `Directory.Build.props`) +3. `Main()` must return `100` on success +4. Use `[MethodImpl(MethodImplOptions.NoInlining)]` on methods to prevent inlining +5. Use `GC.KeepAlive()` to ensure objects are live at GC stress points +6. Add the debuggee name to `BasicStressTests.Debuggees` -Most runs achieve >99.5% pass rate. A small number of failures (~0.2%) -are expected due to the ScanFrameRoots gap — the cDAC does not yet enumerate -GC roots from explicit frame stub data (e.g., `StubDispatchFrame`, -`PInvokeCalliFrame`). These are tracked in [known-issues.md](known-issues.md). +## Debuggee Catalog + +| Debuggee | Scenarios | +|----------|-----------| +| **BasicAlloc** | Objects, strings, arrays, many live refs | +| **ExceptionHandling** | try/catch/finally funclets, nested exceptions, filter funclets, rethrow | +| **DeepStack** | Deep recursion with live refs at each frame | +| **Generics** | Generic method instantiations, interface dispatch, delegates | +| **PInvoke** | P/Invoke transitions, pinned GC handles, struct with object refs | +| **MultiThread** | Concurrent threads with synchronized GC stress | +| **Comprehensive** | All-in-one: every scenario in a single run | +| **StructScenarios** | Struct returns, by-ref params | +| **DynamicMethods** | DynamicMethod / IL emit | + +## Architecture + +``` +CdacStressTestBase.RunGCStressAsync(debuggeeName) + │ + ├── Locate core_root/corerun (CORE_ROOT env or default path) + ├── Locate debuggee DLL (artifacts/bin/StressTests//...) + ├── Start Process: corerun + │ Environment: + │ DOTNET_CdacStress=0x001 + │ DOTNET_CdacStressLogFile= + │ DOTNET_ContinueOnAssert=1 + ├── Wait for exit (timeout: 300s) + ├── Parse results log → CdacStressResults + └── Assert: exit=100, zero failures +``` -Walk parity (`WALK` flag) should show 0 mismatches. diff --git a/src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 b/src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 index f51933048c5ac4..dcaa176f20d65f 100644 --- a/src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 +++ b/src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 @@ -11,31 +11,30 @@ Supports Windows, Linux, and macOS. - The DOTNET_CdacStress environment variable controls WHERE and WHAT is verified: - WHERE (low nibble): - 0x1 = ALLOC — verify at allocation points - 0x2 = GC — verify at GC points - 0x4 = INSTR — verify at instruction-level GC stress points (requires DOTNET_GCStress) - WHAT (high nibble): - 0x10 = REFS — compare GC stack references (cDAC vs runtime) - 0x20 = WALK — compare stack walk frames (cDAC vs DAC) - 0x40 = USE_DAC — also compare GC refs against DAC + The DOTNET_CdacStress environment variable controls WHEN verification fires: + TRIGGERS: + 0x001 = ALLOC — verify at every managed allocation MODIFIER: - 0x100 = UNIQUE — only verify each IP once + 0x200 = VERBOSE — rich per-ref diagnostics in the log + + The runtime's own GC root enumeration is the single oracle. Any trigger + causes cDAC's GetStackReferences output to be compared against it. .PARAMETER Configuration Runtime configuration: Checked (default) or Debug. +.PARAMETER CdacConfiguration + cDAC build configuration: Release (default) or Checked/Debug. + Stress runs default to Release because the cDAC is compared against the + runtime oracle on every trigger; cDAC-side asserts are not the oracle and + NativeAOT Checked/Debug is roughly 5x slower, dominating stress wall-time. + Override to Checked/Debug if you want cDAC asserts while reproducing a + specific failure. + .PARAMETER CdacStress - Hex value for DOTNET_CdacStress flags. Default: 0x11 (ALLOC|REFS). + Hex value for DOTNET_CdacStress flags. Default: 0x001 (ALLOC). Common values: - 0x11 = ALLOC|REFS (fast, allocation points only) - 0x14 = INSTR|REFS (thorough, requires GCStress) - 0x74 = INSTR|REFS|WALK|USE_DAC (full comparison, slow) - -.PARAMETER GCStress - Hex value for DOTNET_GCStress. Default: empty (disabled). - Set to 0x4 for instruction-level stress. + 0x001 = ALLOC (allocation points only, every hit verified) .PARAMETER Debuggee Which debuggee(s) to run. Default: All. @@ -50,16 +49,16 @@ .EXAMPLE ./RunStressTests.ps1 -SkipBuild ./RunStressTests.ps1 -Debuggee BasicAlloc -SkipBuild - ./RunStressTests.ps1 -CdacStress 0x74 -GCStress 0x4 # Full comparison with GCStress - ./RunStressTests.ps1 -CdacStress 0x114 -SkipBuild # Unique IPs only + ./RunStressTests.ps1 -CdacStress 0x201 -SkipBuild # ALLOC + VERBOSE #> param( [ValidateSet("Checked", "Debug")] [string]$Configuration = "Checked", - [string]$CdacStress = "0x11", + [ValidateSet("Release", "Checked", "Debug")] + [string]$CdacConfiguration = "Release", - [string]$GCStress = "", + [string]$CdacStress = "0x001", [string[]]$Debuggee = @(), @@ -126,8 +125,8 @@ Write-Host "=== cDAC Stress Test ===" -ForegroundColor Cyan Write-Host " Repo root: $repoRoot" Write-Host " Platform: $platformId" Write-Host " Configuration: $Configuration" +Write-Host " CdacConfig: $CdacConfiguration" Write-Host " CdacStress: $CdacStress" -Write-Host " GCStress: $(if ($GCStress) { $GCStress } else { '(disabled)' })" Write-Host " Debuggees: $($selectedDebuggees -join ', ')" Write-Host "" @@ -135,10 +134,16 @@ Write-Host "" # Step 1: Build CoreCLR + cDAC # --------------------------------------------------------------------------- if (-not $SkipBuild) { - Write-Host ">>> Step 1: Building CoreCLR native + cDAC tools ($Configuration)..." -ForegroundColor Yellow + Write-Host ">>> Step 1: Building CoreCLR native ($Configuration) + cDAC tools ($CdacConfiguration)..." -ForegroundColor Yellow Push-Location $repoRoot try { - $buildArgs = @("-subset", "clr.native+tools.cdac", "-c", $Configuration, "-rc", $Configuration, "-lc", "Release", "-bl") + # cDAC (mscordaccore_universal) is built via the 'tools' category, which + # picks up ToolsConfiguration. Explicitly set it to $CdacConfiguration + # (default: Release) so the in-process stress framework loads an + # optimized NAOT shim. Otherwise it falls back to -c $Configuration + # (default: Checked) and DebugOnlyCodeHolder/contract-asserts dominate + # the profile and inflate wall time ~5x. + $buildArgs = @("-subset", "clr.native+tools.cdac", "-c", $Configuration, "-rc", $Configuration, "-lc", "Release", "/p:ToolsConfiguration=$CdacConfiguration", "-bl") & $buildCmd @buildArgs if ($LASTEXITCODE -ne 0) { Write-Error "Build failed with exit code $LASTEXITCODE"; exit 1 } } finally { @@ -153,6 +158,19 @@ if (-not $SkipBuild) { } & $testBuildScript $Configuration generatelayoutonly -SkipRestorePackages /p:LibrariesConfiguration=Release if ($LASTEXITCODE -ne 0) { Write-Error "Core_root generation failed"; exit 1 } + + # Copy the cDAC NAOT shim (built into artifacts/bin/coreclr/../) + # into core_root. The generatelayoutonly step above populates core_root from + # the runtime-config sharedFramework but does not include the cDAC binary + # from a different config. Force-copy ours so the framework loads the right + # build flavor regardless of -CdacConfiguration. + $cdacSrc = Join-Path $repoRoot "artifacts" "bin" "coreclr" "$platformId.$CdacConfiguration" $cdacDll + if (Test-Path $cdacSrc) { + Copy-Item -Path $cdacSrc -Destination (Join-Path $coreRoot $cdacDll) -Force + Write-Host " Copied $cdacDll from $CdacConfiguration build into core_root." -ForegroundColor DarkGray + } else { + Write-Warning "$cdacDll not found at $cdacSrc -- core_root may have wrong-config cDAC." + } } else { Write-Host ">>> Step 1: Skipping build (-SkipBuild)" -ForegroundColor DarkGray if (!(Test-Path $corerunExe)) { @@ -196,7 +214,6 @@ function Find-DebuggeeDll([string]$name) { # Helper: clear stress environment variables function Clear-StressEnv { - Remove-Item Env:\DOTNET_GCStress -ErrorAction SilentlyContinue Remove-Item Env:\DOTNET_CdacStress -ErrorAction SilentlyContinue Remove-Item Env:\DOTNET_CdacStressLogFile -ErrorAction SilentlyContinue Remove-Item Env:\DOTNET_ContinueOnAssert -ErrorAction SilentlyContinue @@ -232,14 +249,13 @@ if (-not $SkipBaseline) { # --------------------------------------------------------------------------- # Step 4: Run with cDAC stress # --------------------------------------------------------------------------- -Write-Host ">>> Step 4: Running with CdacStress=$CdacStress$(if ($GCStress) { " GCStress=$GCStress" })..." -ForegroundColor Yellow +Write-Host ">>> Step 4: Running with CdacStress=$CdacStress..." -ForegroundColor Yellow $logDir = Join-Path $repoRoot "artifacts" "tests" "coreclr" "$platformId.$Configuration" "Tests" "cdacstresslogs" New-Item -ItemType Directory -Force $logDir | Out-Null $totalPasses = 0 $totalFails = 0 -$totalWalkOK = 0 -$totalWalkMM = 0 +$totalKnown = 0 $failedDebuggees = @() $sw = [System.Diagnostics.Stopwatch]::StartNew() @@ -252,9 +268,6 @@ foreach ($d in $selectedDebuggees) { $env:DOTNET_CdacStress = $CdacStress $env:DOTNET_CdacStressLogFile = $logFile $env:DOTNET_ContinueOnAssert = "1" - if ($GCStress) { - $env:DOTNET_GCStress = $GCStress - } $dSw = [System.Diagnostics.Stopwatch]::StartNew() & $corerunExe $dll @@ -262,24 +275,22 @@ foreach ($d in $selectedDebuggees) { $dSw.Stop() # Parse results - $passes = 0; $fails = 0; $walkOK = 0; $walkMM = 0 + $passes = 0; $fails = 0; $known = 0 if (Test-Path $logFile) { $logContent = Get-Content $logFile $passes = ($logContent | Select-String "^\[PASS\]").Count $fails = ($logContent | Select-String "^\[FAIL\]").Count - $walkOK = ($logContent | Select-String "WALK_OK").Count - $walkMM = ($logContent | Select-String "WALK_MISMATCH").Count + $known = ($logContent | Select-String "^\[KNOWN_ISSUE\]").Count } $totalPasses += $passes $totalFails += $fails - $totalWalkOK += $walkOK - $totalWalkMM += $walkMM + $totalKnown += $known $status = if ($ec -eq 100) { "PASS" } else { "FAIL"; $failedDebuggees += $d } $color = if ($ec -eq 100 -and $fails -eq 0) { "Green" } elseif ($ec -eq 100) { "Yellow" } else { "Red" } - $detail = "refs=$passes/$($passes+$fails)" - if ($walkOK -gt 0 -or $walkMM -gt 0) { $detail += " walk=$walkOK/$($walkOK+$walkMM)" } + $detail = "refs=$passes/$($passes+$fails+$known)" + if ($known -gt 0) { $detail += " known=$known" } Write-Host " $d — $status ($detail) [$($dSw.Elapsed.ToString('mm\:ss'))]" -ForegroundColor $color } @@ -293,8 +304,8 @@ Write-Host "" Write-Host "=== Summary ===" -ForegroundColor Cyan Write-Host " Elapsed: $($sw.Elapsed.ToString('mm\:ss'))" Write-Host " Stress refs: $totalPasses PASS / $totalFails FAIL" -ForegroundColor $(if ($totalFails -eq 0) { "Green" } else { "Yellow" }) -if ($totalWalkOK -gt 0 -or $totalWalkMM -gt 0) { - Write-Host " Walk parity: $totalWalkOK OK / $totalWalkMM MISMATCH" -ForegroundColor $(if ($totalWalkMM -eq 0) { "Green" } else { "Yellow" }) +if ($totalKnown -gt 0) { + Write-Host " Known issues: $totalKnown (deferred-frame diffs, not real failures)" -ForegroundColor Yellow } Write-Host " Logs: $logDir" diff --git a/src/native/managed/cdac/tests/StressTests/StressTests.targets b/src/native/managed/cdac/tests/StressTests/StressTests.targets new file mode 100644 index 00000000000000..3bd07f3e9fc521 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/StressTests.targets @@ -0,0 +1,70 @@ + + + + $(MSBuildThisFileDirectory)Debuggees\ + Release + + + + + + + + + + + + + + + + + + + + + <_HelixTestsDir>$([MSBuild]::NormalizeDirectory('$(HelixPayloadDir)', 'tests')) + <_HelixDebuggeesDir>$([MSBuild]::NormalizeDirectory('$(HelixPayloadDir)', 'debuggees')) + + + + + <_TestOutput Include="$(OutputPath)**/*" /> + + + + + + <_XunitConsoleFiles Include="$([System.IO.Path]::GetDirectoryName('$(XunitConsoleNetCoreAppPath)'))/*" /> + + + + + + + + + + + <_DebuggeeOutputDir>$([MSBuild]::NormalizeDirectory('$(RepoRoot)', 'artifacts', 'bin', 'StressTests', '$(DebuggeeName)', '$(DebuggeeConfiguration)')) + + + <_DebuggeeFiles Include="$(_DebuggeeOutputDir)**/*" /> + + + + diff --git a/src/native/managed/cdac/tests/StressTests/cdac-stress-helix.proj b/src/native/managed/cdac/tests/StressTests/cdac-stress-helix.proj new file mode 100644 index 00000000000000..bf00e68a2d5040 --- /dev/null +++ b/src/native/managed/cdac/tests/StressTests/cdac-stress-helix.proj @@ -0,0 +1,83 @@ + + + + + + msbuild + true + true + true + $(_Creator) + true + $(BUILD_BUILDNUMBER) + test/cdac/stresstests/ + pr/dotnet/runtime/cdac-stress-tests + 01:00:00 + + + + + + %(Identity) + + + + + + + + + + @(HelixPreCommand) + + + + + + + <_StressTestCommand>%25HELIX_CORRELATION_PAYLOAD%25\dotnet.exe exec --runtimeconfig %25HELIX_WORKITEM_PAYLOAD%25\tests\Microsoft.Diagnostics.DataContractReader.StressTests.runtimeconfig.json --depsfile %25HELIX_WORKITEM_PAYLOAD%25\tests\Microsoft.Diagnostics.DataContractReader.StressTests.deps.json %25HELIX_WORKITEM_PAYLOAD%25\tests\xunit.console.dll %25HELIX_WORKITEM_PAYLOAD%25\tests\Microsoft.Diagnostics.DataContractReader.StressTests.dll -xml testResults.xml -nologo + + + <_StressTestCommand>$HELIX_CORRELATION_PAYLOAD/dotnet exec --runtimeconfig $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.StressTests.runtimeconfig.json --depsfile $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.StressTests.deps.json $HELIX_WORKITEM_PAYLOAD/tests/xunit.console.dll $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.StressTests.dll -xml testResults.xml -nologo + + + + + $(StressTestsPayload) + $(_StressTestCommand) + $(WorkItemTimeout) + + + + + diff --git a/src/native/managed/cdac/tests/StressTests/known-issues.md b/src/native/managed/cdac/tests/StressTests/known-issues.md index 6445d255b67362..838e4491d2dc48 100644 --- a/src/native/managed/cdac/tests/StressTests/known-issues.md +++ b/src/native/managed/cdac/tests/StressTests/known-issues.md @@ -1,57 +1,97 @@ # cDAC Stack Reference Walking — Known Issues -This document tracks known gaps between the cDAC's stack reference enumeration -and the legacy DAC's `GetStackReferences`. - -## Current Test Results - -Using `DOTNET_CdacStress` with cDAC-vs-DAC comparison: - -| Mode | Non-EH debuggees (6) | ExceptionHandling | -|------|-----------------------|-------------------| -| INSTR (0x4 + GCStress=0x4, step=10) | 0 failures | 0-2 failures | -| ALLOC+UNIQUE (0x101) | 0 failures | 4 failures | -| Walk comparison (0x20, IP+SP) | 0 mismatches | N/A | - -## Known Issue: cDAC Cannot Unwind Through Native Frames - -**Severity**: Low — only affects live-process stress testing during active -exception first-pass dispatch. Does not affect dump analysis where the thread -is suspended with a consistent Frame chain. - -**Pattern**: `cDAC < DAC` (cDAC reports 4 refs, DAC reports 10-13). -ExceptionHandling debuggee only, 4 deterministic occurrences per run. - -**Root cause**: The cDAC's `AMD64Unwinder.Unwind` (and equivalents for other -architectures) can only unwind **managed** frames — it checks -`ExecutionManager.GetCodeBlockHandle(IP)` first and returns false if the IP -is not in a managed code range. This means it cannot unwind through native -runtime frames (allocation helpers, EH dispatch code, etc.). - -When the allocation stress point fires during exception first-pass dispatch: - -1. The thread's `m_pFrame` is `FRAME_TOP` (no explicit Frames in the chain - because the InlinedCallFrame/SoftwareExceptionFrame have been popped or - not yet pushed at that point in the EH dispatch sequence) -2. The initial IP is in native code (allocation helper) -3. The cDAC attempts to unwind through native frames but - `GetCodeBlockHandle` returns null for native IPs → unwind fails -4. With no Frames and no ability to unwind, the walk stops early - -The legacy DAC's `DacStackReferenceWalker::WalkStack` succeeds because -`StackWalkFrames` calls `VirtualUnwindToFirstManagedCallFrame` which uses -OS-level unwind (`RtlVirtualUnwind` on Windows, `PAL_VirtualUnwind` on Unix) -that can unwind ANY native frame using PE `.pdata`/`.xdata` sections. - -**Possible fixes**: -1. **Ensure Frames are always available** — change the runtime to keep - an explicit Frame pushed during allocation points within EH dispatch. - The cDAC cannot do OS-level native unwind (it operates on dumps where - `RtlVirtualUnwind` is not available). The Frame chain is the only - mechanism the cDAC has for transitioning through native code to reach - managed frames. If `m_pFrame = FRAME_TOP` when the IP is native, the - cDAC cannot proceed. -2. **Accept as known limitation** — these failures only occur during - live-process stress testing at a narrow window during EH first-pass - dispatch. In dumps, the exception state is frozen and the Frame chain - is consistent. +This document tracks known gaps between the cDAC's stack reference +enumeration and the runtime's own GC root scanning, exposed by the +`cdacstress` framework (`src/coreclr/vm/cdacstress.cpp`). + +## Verification verdicts + +When running `RunStressTests.ps1` (Checked, `DOTNET_CdacStress=0x001` = +`ALLOC`), each verification is bucketed into one of: + +| Verdict | Meaning | +|---------|---------| +| `[PASS]` | cDAC matches the runtime's GC root enumeration. | +| `[KNOWN_ISSUE]` | cDAC and the runtime differ, but every diff is on a Frame the cDAC explicitly marked as deferred (see Bucket 1 below). Not a regression. | +| `[FAIL]` | A real cDAC vs runtime discrepancy, or `GetStackReferences` failed at the API boundary. Investigate. | + +The native harness detects the deferred-frame sentinels emitted by the +cDAC managed code and relabels per-frame diffs as `[KNOWN_NIE]` +in the structured log. + +## Open buckets + +### Bucket 1: `PromoteCallerStack` requires `ICallingConvention` + +`GcScanner.PromoteCallerStack` (in +`src/native/managed/cdac/.../Contracts/StackWalk/GC/GcScanner.cs`) +is deliberately stubbed: instead of enumerating the caller's argument +refs it records the frame as deferred and returns. Producing correct +caller-argument layouts requires porting `ArgIterator` behind the +`ICallingConvention` contract, which is a separate deferred work item. + +To prevent these deferred frames from masquerading as real cDAC bugs, +the managed code records each deferred frame on the `GcScanContext` +via `RecordDeferredFrame`, which emits a sentinel `StackRefData` entry +with `GcScanFlags.CDAC_DEFERRED_FRAME` (0x40000000) set. The native +stress harness strips these sentinels and re-classifies any RT-only +diff at a deferred Source address as `[KNOWN_NIE]`, and the +whole verification as `[KNOWN_ISSUE]` rather than `[FAIL]`. + +Expected pattern in the log: + +``` +[KNOWN_ISSUE] Thread=0x... IP=0x... cDAC=6 RT=7 frames=5 (match=4 mismatch=0 known_nie=1) + Frame #4 [KNOWN_NIE] cDAC=0 RT=1 SP_cDAC=0x0 SP_RT=0x0 + [NIE(RT)] Addr=0x... Obj=0x... Flags=0x0 Reg=-1 Off=0 + [STACK_TRACE] (cDAC=6 RT=7 frames=5) + #0 System.AppContext.Setup(...) (cDAC=2 RT=2) + ... + #4 (cDAC=0 RT=1) <-- KNOWN_NIE (PromoteCallerStack deferred) +``` + +Every JIT frame's count matches exactly; the only discrepancy is on +the explicit transition Frame that `PromoteCallerStack` would scan. + +To re-enable: implement `ICallingConvention.PortableArgumentIterator`, +then replace the `RecordDeferredFrame` stub in `PromoteCallerStack` with +a call into the new contract. Once that lands, the previously-tracked +`ELEMENT_TYPE_INTERNAL` (0x21) case in signature decoding will also +need to be handled — that case currently isn't reachable because +`PromoteCallerStack` short-circuits without iterating the signature. + +## Future work + +- Investigate the GcInfo safe-point bitmap decoding difference for + QCall frames. +- Replace `fprintf`-based stress logging in `cdacstress.cpp` with a + more structured mechanism (e.g., ETW events or StressLog) for better + tooling integration and reduced I/O overhead during stress runs. + +## Log Format + +Each verification emits a single header line followed by, on `[FAIL]` or +`[KNOWN_ISSUE]`, a per-broken-frame block and a stack trace. + +``` +[PASS] Thread=0x... IP=0x... cDAC=N RT=N frames=N + +[KNOWN_ISSUE] Thread=0x... IP=0x... cDAC=N RT=M frames=N (match=N mismatch=N known_nie=N) + Frame #i [KNOWN_NIE] cDAC=X RT=Y SP_cDAC=0x... SP_RT=0x... + [NIE(RT)] Addr=0x... Obj=0x... Flags=0x... Reg=N Off=N + [STACK_TRACE] (cDAC=N RT=M frames=N) + #i MethodName (cDAC=X RT=Y) + #i (cDAC=X RT=Y) <-- KNOWN_NIE (PromoteCallerStack deferred) + +[FAIL] Thread=0x... IP=0x... cDAC=N RT=M frames=N (match=N mismatch=N known_nie=N) + Frame #i MethodName [MISMATCH] cDAC=X RT=Y SP_cDAC=0x... SP_RT=0x... + [ONLY(cDAC)] Addr=0x... Obj=0x... Flags=0x... Reg=N Off=N + [ONLY(RT)] Addr=0x... Obj=0x... Flags=0x... Reg=N Off=N + [STACK_TRACE] (cDAC=N RT=M frames=N) + #i MethodName (cDAC=X RT=Y) [<-- MISMATCH] +``` + +Frames whose counts match are omitted from the per-frame block in +concise mode; verbose mode (`DOTNET_CdacStress=0x201`) also emits the +matched refs. + diff --git a/src/native/managed/cdac/tests/UnitTests/README.md b/src/native/managed/cdac/tests/UnitTests/README.md index ace3cf7697bdb3..4718294287421a 100644 --- a/src/native/managed/cdac/tests/UnitTests/README.md +++ b/src/native/managed/cdac/tests/UnitTests/README.md @@ -3,6 +3,14 @@ Unit tests for the cDAC data contract reader. Tests use mock memory to simulate a target process without needing a real runtime. +For integration tests that exercise the cDAC against a real runtime, see: + +- [DumpTests](../DumpTests/README.md) — validates cDAC contracts against crash dumps + produced by purpose-built debuggees. +- [StressTests](../StressTests/README.md) — in-process GC stress verification that + compares cDAC stack-reference enumeration against the runtime's own GC root + scanning at every wired stress trigger point (currently managed allocation). + ## Building and running ```bash