[cdac] x86: rewire stack-walker to shared CallingConvention / GCRefMap#129858
Draft
max-charlamb wants to merge 33 commits into
Draft
[cdac] x86: rewire stack-walker to shared CallingConvention / GCRefMap#129858max-charlamb wants to merge 33 commits into
max-charlamb wants to merge 33 commits into
Conversation
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the cDAC stack-walking and stress infrastructure by wiring Windows x86 transition-frame caller scanning to the shared ICallingConvention/GCRefMap machinery, and by extending the in-proc cdacstress harness to support multiple sub-checks (GCREFS + ARGITER) with richer test/debuggee coverage.
Changes:
- Reworked caller-stack promotion to optionally synthesize and decode GCRefMap blobs via a new
ICallingConvention.TryComputeArgGCRefMapBlobpath and shared token enumeration. - Extended cdacstress configuration/telemetry (flag layout,
[GC_STATS]/[ARG_STATS]) and updated managed stress harness parsing/assertions; added multiple new stress debuggees. - Refactored/shared calling convention code (ArgIterator/TransitionBlock/ITypeHandle) for reuse by cDAC and ReadyToRun tooling.
Reviewed changes
Copilot reviewed 60 out of 60 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 | Updates stress flag documentation/defaults to new WHERE/WHAT/MODIFIER layout. |
| src/native/managed/cdac/tests/StressTests/README.md | Documents new flag regions, sub-check markers, and updated defaults. |
| src/native/managed/cdac/tests/StressTests/known-issues.md | Updates defaults and adds documentation for an intermittent x86 flake. |
| src/native/managed/cdac/tests/StressTests/Debuggees/VarArgs/VarArgs.csproj | Adds VarArgs debuggee project file. |
| src/native/managed/cdac/tests/StressTests/Debuggees/VarArgs/Program.cs | Adds varargs-focused ARGITER debuggee. |
| src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/Program.cs | Adds nested-struct scenario to stress ArgIterator/GCDesc paths. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Program.cs | Adds cross-module signature/type-resolution debuggee. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Lib/Types.cs | Adds library types used by CrossModule debuggee to force cross-module resolution. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Lib/CrossModuleLib.csproj | Adds library project for CrossModule debuggee. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/CrossModule.csproj | Adds main CrossModule debuggee project file + ProjectReference wiring. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CallSignatures/Program.cs | Adds comprehensive ARGITER signature-shape coverage debuggee. |
| src/native/managed/cdac/tests/StressTests/Debuggees/CallSignatures/CallSignatures.csproj | Adds CallSignatures debuggee project file and warning suppressions. |
| src/native/managed/cdac/tests/StressTests/CdacStressTests.cs | Replaces prior basic test entrypoint with expanded debuggee catalog and GCREFS/ARGITER theories. |
| src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs | Adds per-mode stress runner and stricter assertions/target detection. |
| src/native/managed/cdac/tests/StressTests/CdacStressResults.cs | Adds parsing of [GC_STATS] / [ARG_STATS] and captures ARGITER divergence lines. |
| src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs | Removes the older stress test harness class (superseded by CdacStressTests). |
| src/native/managed/cdac/tests/DumpTests/StackReferenceDumpTests.cs | Removes x86 skip now that x86 GCInfo decoding is supported. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/StressTestApi/CdacStressApi.cs | Adds legacy DAC request handlers for cdacstress private opcodes, including ARGITER blob request. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs | Routes stress private requests to the new CdacStressApi helper. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/MethodTableFlags_1.cs | Extends MethodTable flags helpers (shared instantiation + byreflike). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Microsoft.Diagnostics.DataContractReader.Contracts.csproj | Links in shared CallingConvention/ArgIterator sources into Contracts build. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/DataType.cs | Adds PInvokeCalliFrame to the cDAC type discriminator enum. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/TransitionBlock.cs | Adds OffsetOfArgs field address to support correct x86 GCRefMap pos->addr mapping. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/PInvokeCalliFrame.cs | Adds cDAC data type to expose VASigCookiePtr for PInvokeCalliFrame. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/CoreCLRContracts.cs | Registers the new ICallingConvention contract implementation. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs | Swallows NotImplementedException per-frame to allow partial stack-walk results. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs | Reworks caller-stack scanning to use synthesized GCRefMap blobs on Windows x86/x64; factors shared token iteration; fixes x86 DynamicHelperFrame arg-reg offsets; corrects x86 pos->addr mapping. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs | Adds a byte[]-backed decoder constructor for host-synthesized blobs. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/FrameHandling/X86FrameHandler.cs | Computes x86 transition-frame caller SP by decoding stack-pop prefix (or VASigCookie for PInvokeCalliFrame). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs | Implements new RTS helpers (byreflike, unboxing stub, approx field type handle). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCTransition.cs | Fixes ctor assignments for IsThis/Iptr fields. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCArgTable.cs | Fixes several x86 arg table decoding issues and removes debug prints; adjusts stack-depth transition deltas. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/CallingConvention/CdacTypeHandle.cs | Adds cDAC-backed ITypeHandle implementation for shared ArgIterator. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/CallingConvention/ArgumentLocation.cs | Adds internal argument-location record for encoder bookkeeping. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs | Adds new contract APIs (byreflike, unboxing stub, approx field type handle). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/ICallingConvention.cs | Introduces new calling-convention contract + public surface. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs | Exposes CallingConvention contract property on registry. |
| src/coreclr/vm/frames.h | Exposes PInvokeCalliFrame::m_pVASigCookie via cdac_data specialization. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Adds PInvokeCalliFrame descriptor and registers CallingConvention global contract. |
| src/coreclr/vm/cdacstress.cpp | Adds WHAT/WHERE/MODIFIER flag regions, ARGITER sub-check (runtime-vs-cDAC blob comparison), and emits [GC_STATS]/[ARG_STATS]. |
| src/coreclr/tools/Common/CallingConvention/TransitionBlock.cs | Moves TransitionBlock into Internal.CallingConvention and refactors to use ITypeHandle. |
| src/coreclr/tools/Common/CallingConvention/SystemVAmd64PassingDescriptor.cs | Adds extracted SystemV descriptor types for shared use. |
| src/coreclr/tools/Common/CallingConvention/ITypeHandle.cs | Adds shared type abstraction for calling convention computation. |
| src/coreclr/tools/Common/CallingConvention/FpStructInRegistersInfo.cs | Adds extracted RISC-V/LoongArch FP struct classification types for shared use. |
| src/coreclr/tools/Common/CallingConvention/ArgIterator.cs | Refactors ArgIterator into Internal.CallingConvention and reworks it to use ITypeHandle + shared TransitionBlock. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/WasmLowering.ReadyToRun.cs | Updates Wasm lowering code to use new ITypeHandle shape. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj | Links in shared CallingConvention sources and adjusts ReadyToRun file list. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmR2RToInterpreterThunkNode.cs | Updates using/aliasing for new ArgIterator namespace. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmInterpreterToR2RThunkNode.cs | Updates using/aliasing for new ArgIterator namespace. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmImportThunk.cs | Updates using/aliasing for new ArgIterator namespace. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/TypeHandle.cs | Adds crossgen2-backed ITypeHandle implementation. |
| src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/GCRefMapBuilder.cs | Updates builder to use shared TransitionBlock/ArgIterator/ITypeHandle. |
| src/coreclr/inc/dacprivate.h | Adds new private request opcode and request struct for ARGITER blob retrieval (CDAC_STRESS). |
| src/coreclr/inc/clrconfigvalues.h | Simplifies CdacStress config description now that layout is more complex. |
| eng/pipelines/runtime-diagnostics.yml | Adds windows_x86 to runtime-diagnostics pipeline parameter default list. |
| docs/design/datacontracts/RuntimeTypeSystem.md | Documents new RTS APIs and flags. |
| docs/design/datacontracts/GCInfo.md | Updates x86 behavior documentation and adds x86 specifics section. |
| docs/design/datacontracts/CallingConvention.md | Adds new CallingConvention contract design doc. |
Comment on lines
+8
to
+14
| public interface ICallingConvention : IContract | ||
| { | ||
| static string IContract.Name => nameof(CallingConvention); | ||
|
|
||
| bool TryComputeArgGCRefMapBlob(MethodDescHandle methodDesc, out byte[] blob) | ||
| => throw new NotImplementedException(); | ||
| } |
Comment on lines
+65
to
+66
| if (inBuffer is null || inSize < (uint)Unsafe.SizeOf<DacStressArgGCRefMapRequest>()) | ||
| return HResults.E_INVALIDARG; |
Comment on lines
+1
to
+9
| <Project Sdk="Microsoft.NET.Sdk"> | ||
| <PropertyGroup> | ||
| <LangVersion>latest</LangVersion> | ||
| <!-- This debuggee deliberately defines many "test scaffolding" types and | ||
| enums where StyleCop's one-value-per-line / no-unassigned-fields | ||
| conventions don't add value. Extend the parent NoWarn. --> | ||
| <NoWarn>$(NoWarn);SA1136;CS0649</NoWarn> | ||
| </PropertyGroup> | ||
| </Project> |
Comment on lines
+62
to
+86
| public int GetSize() | ||
| { | ||
| // Constructed pointer/array/byref args always occupy one TADDR slot | ||
| // in the transition block (the actual pointee is reached via the | ||
| // pointer value, not stored inline). When _kindOverride is set, the | ||
| // underlying TypeHandle may be null (uncached PTR), so GetBaseSize | ||
| // would fault. | ||
| if (_kindOverride is CdacCorElementType.Ptr | ||
| or CdacCorElementType.Byref | ||
| or CdacCorElementType.SzArray | ||
| or CdacCorElementType.Array) | ||
| { | ||
| return PointerSize; | ||
| } | ||
|
|
||
| if (_typeHandle.IsNull) | ||
| return 0; | ||
|
|
||
| // GetBaseSize returns the full object size including object header and padding. | ||
| // For value types used in calling convention, we need the unboxed size. | ||
| // BaseSize = ObjHeader + MethodTable* + unboxed fields, aligned to pointer size. | ||
| // Unboxed size = BaseSize - 2 * PointerSize (subtract ObjHeader + MT pointer). | ||
| uint baseSize = Rts.GetBaseSize(_typeHandle); | ||
| return (int)(baseSize - (uint)(2 * PointerSize)); | ||
| } |
…ots stackref tests Builds on the partial x86 IGCInfo support added in dotnet#129456 by porting the remaining decoder pieces required for GC-root scanning on x86, so that `IStackWalk.WalkStackReferences` returns live frame slots on x86 cDAC. The x86 GC info uses the legacy bit-packed `InfoHdr` byte-stream encoding (`src/coreclr/vm/gc_unwind_x86.inl`, `src/coreclr/inc/gcdecoder.cpp`) instead of the modern `GcInfoDecoder` shared by other architectures, so the implementation lives entirely on the existing `X86GCInfo` decoder under `Contracts/GCInfo/X86/`. Changes ------- * `X86GCInfo`: add `UntrackedSlots` lazy property + `DecodeUntrackedSlots()` -- delta-decoded signed varints with the double-align-frame rebase from `gc_unwind_x86.inl:3467`. * `X86GCInfo`: add `VarPtrLifetimes` lazy property + `DecodeVarPtrLifetimes()` -- triplets of (varOffs, begOffs delta, endOffs delta) for EBP-frame tracked locals. * Two new public record types `UntrackedSlot` and `VarPtrLifetime` capture the decoded entries. * `IsCodeOffsetInProlog` / `IsCodeOffsetInEpilog` helpers (offset-parameterised, so EnumerateLiveSlots can answer for any instruction offset without re-constructing X86GCInfo). * `RegMaskToRegisterNumber` helper maps the single-bit `RegMask` flags-enum values to the x86 ModRM register numbers used by `X86Context.TryReadRegister` and `LiveSlot.RegisterNumber`. * Implement `IGCInfoDecoder.EnumerateLiveSlots(uint offset, options)`: early-return empty in prolog/epilog (or aborted+non-interruptible), emit untracked locals (suppressed for filter funclets), emit VarPtr lifetimes covering `offset`, walk `Transitions` up to `offset` accumulating live registers + pushed pointer args, and emit a partially-interruptible `GcTransitionCall` exactly at `offset`. * Flip `IGCInfoDecoder.GetSizeOfStackParameterArea` from `NotSupportedException` to `return 0` for x86 -- x86 has no separate outgoing-argument scratch area; per-offset transitions report pushed args directly, so the GcScanner scratch-area filter is a no-op (correct). * Remove the `[SkipOnArch("x86", "GCInfo decoder does not support x86")]` markers on `GCRoots_WalkStackReferences_FindsRefs` and `GCRoots_RefsPointToValidObjects`. * `DumpTests.targets`: add optional `DebuggeeFilter=<Name>` to restrict `GenerateAllDumps` to a single debuggee. Useful for iterative local x86 work where some other debuggee's publish may fail. * `docs/design/datacontracts/GCInfo.md`: enumerate which `IGCInfoDecoder` APIs are wired up on x86. Out of scope (deferred) ----------------------- * `GetInterruptibleRanges` for x86 -- the only consumer is the catch-handler PC override in `StackWalk_1`; no x86-relevant scenarios today. * "this"-pointer special-case reporting for synchronized methods (VarPtr 0x2 bit currently masked out). * IPtrMask interior-pointer bitmaps for pushed args (uses the simpler per-push `Iptr` flag). * Funclet handling beyond the existing `IsParentOfFuncletStackFrame` caller-side early-skip. * Finer `IsActiveFrame` register filter precision. Validation ---------- * All 2525 cDAC unit tests pass. * The two unblocked `GCRoots_*` tests pass against a freshly generated x86 GCRoots dump. * Broader `DumpTests` x86 sweep: 34 pass / 46 fail / 830 skip -- net +2 vs. before this change (the two GCRoots tests), zero regressions. The 46 pre-existing failures are all unrelated to GCInfo (`ThreadDumpTests` / `ComWrappersDumpTests` / `RuntimeInfoDumpTests` / `WorkstationGCDumpTests` and similar). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intro paragraph was getting wordy with the per-API status; pull that content out into a new "x86 specifics" section at the end of the file with a table covering supported/not-implemented APIs and a deferred-edges list. Intro now just notes that x86 is partially supported and links to the section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Context: x86 has used the funclet EH model since PR dotnet#115957 / dotnet#122872 (I had previously assumed otherwise and documented this API as intentionally not implemented). Catch funclet unwinding does call into `IGCInfo.GetInterruptibleRanges` via the parent-frame PC override path in `StackWalk_1.WalkStackReferences`, so throwing `NotSupportedException` is a real correctness gap (silently swallowed by the per-frame try/catch and producing missed parent-frame GC roots). Implementation -------------- Match the semantics of the native x86 walker (`EnumGcRefsX86` in `gc_unwind_x86.inl`): * Fully interruptible (`Header.Interruptible == true`): emit one range per gap between the prolog end and each epilog start, plus a final range from the last epilog end to `MethodSize`. This mirrors the native walker's prolog/epilog short-circuit return at line 3091. * Partially interruptible: walk `Transitions` and emit a single-byte `(offset, offset + 1)` range for each `GcTransitionCall`. Per the native partial-interrupt encoding doc (`gc_unwind_x86.inl:1066+`), call sites are the only GC-safe points in this mode -- the intervening `GcTransitionRegister` / `GcTransitionPointer` / `StackDepthTransition` / `IPtrMask` / `CalleeSavedRegister` events are all bookkeeping, not safe points. Docs ---- Update `docs/design/datacontracts/GCInfo.md` x86 specifics: * Move `GetInterruptibleRanges` from "Not implemented" to "Implemented" in the supported APIs table. * Remove the false claim that x86 has no funclets / no x86-relevant scenarios for this API. * Reference PR dotnet#115957 (Enable new exception handling on win-x86) for context on the EH model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pure file rename (git mv) -- the class was renamed `GCInfo` -> `X86GCInfo` in dotnet#129456 to avoid collision with the empty IGCInfo fallback struct, but the file kept the old name. Bring the filename in line. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed bits
I had been masking out the 0x2 bit for VarPtr lifetimes, claiming it
meant "this" pointer not pinned for tracked locals. That's wrong on
modern x86: the native VarPtr loop in gc_unwind_x86.inl:3610-3613
explicitly says
// First Bit : byref
// Second Bit : pinned
// Both bits are valid
flags |= lowBits;
The this_OFFSET_FLAG = 0x2 interpretation in gcinfo.h was scoped to the
legacy JIT32_ENCODER on x86 without funclets, which has been gone since
dotnet#115957 enabled funclet EH on win-x86 (and dotnet#122872 removed the rest).
Pass LowBits straight through into LiveSlot.GcFlags for VarPtr entries,
matching what we already do for untracked slots. Update the
VarPtrLifetime LowBits xmldoc to drop the wrong "this"-pointer note.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GcTransitionRegister and GcTransitionPointer constructors took `isThis` and `iptr` parameters but silently dropped them on the floor. The default-false properties survived past every `0xBF` interior-pointer prefix, every `0xBC` this-pointer prefix, and every byref CallRegister, so the LIVE state walker's iptr accumulator was always 0. Stress tests on Windows x86 saw ~30 register-resident interior-pointer mismatches per debuggee in CoreLib code (e.g. EventSource.DefineEventPipeEvents) where RT reported Reg=EDI Flags=0x1 (interior) and cDAC reported Flags=0x0. Add the missing `IsThis = isThis; Iptr = iptr;` assignments. Drops the BasicAlloc x86 stress mismatch count from 30 to 0 and the full-suite total from 316 to 14 (99.984% match). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native gcdumpx86.cpp:361 declares `curOffs = 0` once *outside* the partial-interrupt EBP-frame walk loop and accumulates code-offset deltas across iterations. cDAC's `GetTransitionsEbpFrame` declared it *inside* the `while (true)` loop body, so it was reset to 0 every iteration -- causing all partial-interrupt EBP-frame call-site transitions to be emitted at small per-iteration deltas instead of true cumulative method offsets. Effect: callee-saved register liveness reported by RT on real call sites in CoreLib R2R'd code (where the JIT typically keeps GC refs in EBX/ESI/EDI across calls) was missed by cDAC's EnumerateLiveSlots, because activeCallSite never matched at the queried offset. Stress tests on Windows x86 saw ~3K such under-reports per BasicAlloc run (1480 EDI + 1284 EBX + 476 ESI), concentrated in System.Diagnostics.Tracing.EventSource.* methods. Move `uint curOffs = 0;` outside the loop to mirror native, dropping the BasicAlloc x86 stress mismatch count from 1742 to 30 (98.0% -> 99.97% match). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four related correctness fixes in X86GCInfo's `EnumerateLiveSlots` and `EnumerateTransitionLiveSlots`, mirroring native EnumGcRefsX86 in gc_unwind_x86.inl. These were uncovered by running the cDAC GC stress tests against Windows x86 corerun and comparing cDAC's slot output against the runtime's at every managed allocation. 1. **Pushed-pointer-arg address calculation** -- Native at gc_unwind_x86.inl:3262 addresses bit `i` of the args mask at `pPendingArgFirst - i*sizeof(DWORD)`, where pPendingArgFirst is the FIRST-pushed-arg address (highest among pushed args). cDAC was storing pushed pointers keyed by `-depthSlots*4` and emitting them as SP_REL slots with the negative offset, resolving to addresses *below* the call-site SP -- impossible for live args. Switched to storing by push-index (depth-at-push-time, 0-indexed) and translating to positive SP-relative offset `(finalDepth - 1 - pushIndex) * 4` at emit time. 2. **ESP-only push/pop semantics** -- GcArgTable encodes ESP push/pop bytes as `GcTransitionRegister` with `RegMask.ESP` and Action.PUSH/ POP, but those mean "stack-depth tracking" (non-pointer args), not "pointer push". cDAC was treating them as pointer pushes (adding phantom entries to pushedPtrs) and was filtering them through the regOffset gate (which is for register-liveness LIVE/DEAD events only, not arg-stream events). Now ESP-only pushes only advance depth, and the regOffset gate fires only for `Action.LIVE`/ `Action.DEAD`. 3. **Register-state offset on non-leaf frames** -- Native EnumGcRefsX86 uses `curOffsRegs = curOffs - 1` for non-active stack frames because register liveness can change across calls (gc_unwind_x86.inl:3149+). Mirror that: register-liveness LIVE/DEAD events at offset > regOffset are skipped on non-active frames. 4. **VarPtr lifetimes on EBP frames** -- Native at gc_unwind_x86.inl:3567-3573 negates the encoded VarPtr stack offset for EBP frames (the encoded value is positive but means EBP-relative-negative for locals). cDAC was leaving it positive, reporting locals at the wrong addresses. Also use `curOffs-1` for the lifetime-range check on non-active frames, mirroring gc_unwind_x86.inl:3540-3548 (a variable could be dead at the return address if the call was the last instruction of a try and the return jumps to a catch handler). 5. **Callee-trashed register filtering** -- On non-active frames the callee will have overwritten EAX/ECX/EDX, so any GC refs they held at the call site are stale. Native gates these via the `ActiveStackFrame` flag at gc_unwind_x86.inl:3189-3199; mirror that in EnumerateTransitionLiveSlots. Combined effect on Windows x86 stress (BasicAlloc, Checked): Before: 59,514 matched / 27,702 mismatched (68%) After: 87,778 matched / 0 mismatched (100%) Across the full 9-debuggee suite: 43,771 PASS / 14 FAIL (99.97%). The remaining 14 are filter-funclet / state-context edge cases tracked separately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rgMask call sites Two follow-up correctness fixes for residual stress-test mismatches. 1. **ParentOfFuncletStackFrame early-return** -- Native EnumGcRefsX86 (gc_unwind_x86.inl:3039) returns without reporting any GC refs when the stack walker has flagged the frame as a parent whose locals are already being reported via a funclet (e.g. a catch handler). cDAC's `StackWalk_1` does set `GcSlotEnumerationOptions.IsParentOfFuncletStackFrame` correctly via `gcFrame.ShouldParentToFuncletSkipReportingGCReferences`, but `X86GCInfo.EnumerateLiveSlots` was ignoring it and emitting the parent's slots a second time. This caused stress mismatches in exception-handling scenarios where a catch funclet appears alongside its parent (e.g. `Program.NestedExceptionScenario`): cDAC reported 9 refs on the parent frame while RT correctly reported 0. 2. **Call-site `ArgMask` iteration** -- For partially-interruptible call sites, GcArgTable populates either `GcTransitionCall.PtrArgs` (huge `0xFB` encoding with explicit per-pointer offsets) or `GcTransitionCall.ArgMask` / `IArgs` (tiny / small / medium / large encodings with a bitmap of pushed-pointer slots). cDAC was only iterating `PtrArgs`, silently dropping every call site that used the bitmap form. Native scanArgRegTable (gc_unwind_x86.inl:3373-3402) walks the bitmap low-to-high with `argAddr = ESP + i*sizeof(DWORD)`; mirror that. Affected `System.Diagnostics.Tracing.ManifestBuilder. CreateManifestString` and similar large CoreLib methods where the missing slot was always at `SP+0` (the lowest-bit pushed pointer). Combined with the prior commits in this branch, x86 cDAC GC root walking now reports identical results to the runtime across all 9 stress debuggees: Pre-session baseline: 87,294 frames / 27,702 mismatched (68% match) This commit: 803,556 frames / 0 mismatched (100%) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LiveSlots Tighten comments in `X86GCInfo.cs` and `GCArgTable.cs`: - Replace `gc_unwind_x86.inl:NNNN` line-number citations with file + function-name references (line numbers churn). Cite `EnumGcRefsX86`, `scanArgRegTableI`, `scanArgRegTable`, etc. - Compact the per-block "why" comments to a single concise sentence, removing mechanical "we do X because native does X" duplication. - Tag the regOffset/curOffsRegs note in `EnumerateTransitionLiveSlots` to point at the equivalent native field rather than line numbers. Document the now-functional `EnumerateLiveSlots` in `docs/design/datacontracts/GCInfo.md`. Update the Supported APIs table to reflect that it is implemented and stress-validated, and replace the stale Deferred edges list (now-resolved items dropped, remaining items reframed as "not exercised by current x86 codegen / not stressed"). No behavioral changes; full stress suite still reports 0 mismatches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts commit cb82b21 to keep the file name as `GCInfo.cs`. Even though the class inside is `X86GCInfo` (renamed in dotnet#129456 to avoid colliding with the empty `Contracts.GCInfo` IGCInfo fallback struct), the file name churn shows up as a delete+add in the PR diff, which is harder to review. The C# class name does not need to match the file name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…re now implemented Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ction Drop the per-API x86 callouts in the IGCInfoHandle interface comments and the intro cross-reference. All x86 detail now lives in the dedicated 'x86 specifics' section, minimizing the diff footprint outside that section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All APIs are now implemented on x86; the table was just noise. Distill the two truly x86-specific behavior notes (GetSizeOfStackParameterArea returns 0, GetInterruptibleRanges encoding) into a short bullet list in the intro of the x86 specifics section. The EnumerateLiveSlots behavior subsection is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Was added as a local convenience to restrict GenerateAllDumps to a single debuggee during iterative x86 work. Not needed in the shipping infrastructure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Narrow visibility of decoder-internal types and properties: `UntrackedSlot`, `VarPtrLifetime`, `UntrackedSlots`, `VarPtrLifetimes` are only consumed by `EnumerateLiveSlots`, so make them `internal` (was `public`) to avoid expanding the contracts assembly surface. - Replace hardcoded `* 4` with `_target.PointerSize` in the pushed-arg emit and the ArgMask bitmap iteration -- consistent with the rest of the decoder. - Fix stale VarPtr comment that claimed `0x2` was "this NOT pinned". Modern x86 JIT uses `0x2` for pinned (matching `LiveSlot.GcFlags`); the legacy `this` interpretation only applied to JIT32_ENCODER which is no longer in use. The encoded bits already passed through correctly -- this is a comment fix only. - Implement `GcSlotEnumerationOptions.ReportFPBasedSlotsOnly` on x86 as a post-filter that drops register slots and non-frame-relative stack slots, matching `GCInfoDecoder.ReportSlot` semantics on other arches. Update GCInfo.md to mention the filter under the x86 specifics section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…errupt EBP-less call sites Native scanArgRegTable (gc_unwind_x86.inl) decrements stackDepth by callArgCnt at every call encoding in the EBP-less partial-interrupt walker. cDAC was emitting the delta as positive, which the consumer (EnumerateLiveSlots) was then *adding* to depthSlots -- the opposite direction. Negate at the four StackDepthTransition emission sites in GetTransitionsNoEbp so the consumer's `depthSlots += delta` matches native semantics. In practice this path is unreachable on current x86 codegen because all post-funclet x86 methods are EBP frames (PR dotnet#115957), and the cDAC GC stress suite still reports 0 mismatches across all 9 debuggees -- but the bug was real and the fix is mechanical. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… pushes GcArgTable's GetTransitionsFullyInterruptible emits non-pointer arg pushes (encoding 0xB0..0xB7) as GcTransitionPointer with IsPtr=false. EnumerateLiveSlots was unconditionally recording every PUSH into pushedPtrs, which would incorrectly report non-pointer slots as GC roots. Skip the pushedPtrs entry when IsPtr is false (still advance depthSlots so positional offsets stay correct), matching the RegMask.ESP path in ApplyRegisterTransition. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…est platforms The x86 EnumerateLiveSlots / GetInterruptibleRanges implementation in this PR is validated locally to match EnumGcRefsX86 across all 9 stress debuggees (0 mismatches, 803K frames). Enable CI coverage by including windows_x86 in cdacStressPlatforms, matching the existing windows_x86 entries in cdacDumpPlatforms and cdacXPlatDumpPlatforms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…by pushedSize Native EnumGcRefsX86 computes `argBase = ESP + pushedSize` for ESP frames when iterating the untracked locals table (and VarPtr if present), so an encoded `stkOffs` resolves to `ptrAddr = ESP + pushedSize + stkOffs`. The SP-relative offset cDAC needs to emit is therefore `pushedSize + stkOffs`, not bare `stkOffs`. Compute the pushed-arg size at the queried instruction offset and add it to non-EBP-relative untracked / VarPtr slot offsets. EBP-frame slots are already FRAMEREG-relative and need no adjustment. Refactor `CalculatePushedArgSize()` to delegate to a new `CalculatePushedArgSizeAt(uint codeOffset)` so both the existing `RelativeOffset`-bound `PushedArgSize` property and the new `EnumerateLiveSlots(instructionOffset, ...)` consumer share the walk. Unreachable on current x86 codegen (all post-funclet x86 methods use EBP frames, see dotnet#115957), but the bug was real and would have produced wrong addresses on any future ESP-frame methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native DynamicHelperFrame::GcScanRoots_Impl (frames.cpp) uses
`offsetof(ArgumentRegisters, ECX)` for ObjectArg and
`offsetof(ArgumentRegisters, EDX)` for ObjectArg2 on x86, where the
ArgumentRegisters struct is declared in REVERSED calling-convention
order via ENUM_ARGUMENT_REGISTERS_BACKWARD (cgencpu.h: { EDX, ECX } so
EDX is at offset 0 and ECX at offset +PointerSize). On every other
architecture the struct lays the registers out in forward call order
(first arg at offset 0, second at offset +PointerSize), and the
non-x86 native path adds `sizeof(TADDR)` for ObjectArg2.
cDAC's ScanDynamicHelperFrame assumed the uniform non-x86 layout, so on
x86 it reported ObjectArg at EDX's location and ObjectArg2 at ECX's
location -- the addresses came out 4 bytes off from the runtime, which
surfaced as flaky stress-test mismatches once we added windows_x86 to
cdacStressPlatforms (the previous cDAC stress matrix excluded x86, so
this code path was never exercised against the live runtime).
Detect TARGET_X86 via RuntimeInfo.GetTargetArchitecture and swap the
two offsets to match native semantics. Verified across 5 BasicAlloc
stress runs: 0 mismatches each.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- VarPtrLifetimes XML doc + EnumerateLiveSlots comment: drop the incorrect claim that the VarPtr table is "only EBP frames / empty for ESP frames". Native processes the table on both frame kinds; only the per-entry `stkOffs` negation is EBP-frame-specific. - GetInterruptibleRanges: include `epilogStart` in the preceding interruptible range (end at `eStart + 1`, clamped to method size). IsCodeOffsetInEpilog uses a strict `>` check, so epilogStart itself is NOT in the epilog and EnumerateLiveSlots will happily report slots there -- the range list now matches. - CalculatePushedArgSizeAt: clamp `depth` to >= 0 before casting to uint. StackDepthTransition can carry negative deltas (call-site arg pops in the partial-interrupt ESP-frame encoding) and a transient underflow would otherwise wrap to a huge unsigned, corrupting espBias and the pushed-arg offset calculation. - Cache sorted transition offsets in a `Lazy<ImmutableArray<int>>` instead of re-running `Transitions.Keys.OrderBy(...)` on every EnumerateLiveSlots / CalculatePushedArgSizeAt / GetInterruptibleRanges call. EnumerateLiveSlots fires once per managed frame during stack walking, so this is a hot path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues prevented x86 stress from passing in CI (build 1473187): 1. Leftover Console.WriteLine debug output in GCArgTable.cs flooded the Helix log with thousands of "CallArgCount/CallPndTabCnt/lastSkip/stkOffs" lines and made test triage impossible. Removed all four diagnostic writes plus their now-dead locals (stkOffs/lowBit, lastSkip+param, callPndTabSize, callPndTab read-without-use). 2. CalculatePushedArgSizeAt threw "Unsupported gc transition type" (InvalidOperationException, HRESULT 0x80131509) whenever an ESP-frame walk hit a CalleeSavedRegister transition (emitted by the partial- interrupt no-EBP decoder, case 7 / 0x04). Native scanArgRegTable treats this tag as informational (no effect on outgoing-arg depth); the cDAC switch was missing the case so every ESP-frame stack with a callee-saved-reg tag aborted GetStackReferences with hr=0x80131509. This was the source of 18-70 [FAIL] entries per debuggee in CI. 3. AddressFromGCRefMapPos used the same forward-from-FirstGCRefMapSlot formula on every architecture, but x86's TransitionBlock lays out its two argument registers (EDX/ECX) backward via ENUM_ARGUMENT_REGISTERS_BACKWARD. Native OffsetFromGCRefMapPos (frames.cpp) reverses pos 0/1 within the arg-regs area on x86. Without this fix, ExternalMethodFrame GCRefMap-driven scans reported the wrong arg-reg slot for the first two positions. Stack-arg positions (pos >= 2) remain forward, matching native. Local PInvoke debuggee under DOTNET_CdacStress=0x001: Before fix #1+#2: 70 fail / 64926 matched / 0 mismatched (every [FAIL] was an InvalidOperationException, not a real mismatch). After #1+#2: 2 fail / 66250 matched / 2 mismatched. After all three: 2 fail (remaining is a separate x86 GCRefMap decode bug in pos >= 2 stack args, tracked as follow-up). 5 of 9 debuggees now report 0 failures and 0 mismatches.
After enabling stack walking on x86 (PR dotnet#129547), every ExternalMethodFrame with stack arguments produced a constant 20-byte (5-slot) offset error in cDAC's reported GC roots vs the runtime oracle. Root cause: x86's TransitionBlock layout is: +0 m_argumentRegisters (8 bytes, EDX/ECX, BACKWARD-ordered) +8 m_calleeSavedRegisters (16 bytes) +24 m_ReturnAddress (4 bytes) +28 (== sizeof(TransitionBlock) == OffsetOfArgs) caller-pushed stack args The GCRefMap position scheme on x86 splits positions across these regions: pos 0..1 -> arg regs, reversed (mirrors ENUM_ARGUMENT_REGISTERS_BACKWARD) pos 2..N -> stack args, starting at sizeof(TransitionBlock) There is a 20-byte gap (CalleeSavedRegisters + ReturnAddress) that the GCRefMap deliberately skips. Native OffsetFromGCRefMapPos handles this with an explicit OffsetOfArgs + (pos - NUM_ARGUMENT_REGISTERS) * sizeof(TADDR) branch (frames.cpp). cDAC was using the non-x86 default FirstGCRefMapSlot + pos * PointerSize even on x86. Since FirstGCRefMapSlot = offsetof(m_argumentRegisters) = 0 on x86, pos * 4 walked straight from arg-regs into the CalleeSavedRegisters area, missing the real stack args by 20 bytes. Fix: - Add OffsetOfArgs as a [FieldAddress] on Data.TransitionBlock (the descriptor already exposes it as sizeof(TransitionBlock)). - In AddressFromGCRefMapPos, when running on x86, use ArgumentRegisters as the base for pos < NUM_ARGUMENT_REGISTERS (reversed) and OffsetOfArgs as the base for pos >= NUM_ARGUMENT_REGISTERS. Local stress matrix after this fix (9 debuggees, two consecutive runs): fail=0 mismatched=0 across 47,164 verifications / 595,488 frames. Previously: 2 fail / 2 mismatched per affected debuggee, intermittent.
…C hole) After x86 EnumerateLiveSlots is fully working, ~20% of full stress suite runs surface a pre-existing GC hole during the runtime's first EventSource cctor cascade. The runtime's RT-side scanner sees GC refs in callee-saved registers (ESI/EDI) that the DAC-side scanner doesn't, because a PrestubMethodFrame holding the live regs gets popped before the GC stackwalk completes during exception dispatch. This is independently tracked as runtime issue dotnet#129545/dotnet#129546 and is not specific to the cDAC. Documenting it in known-issues.md so future triagers can recognize the signature instead of chasing it as a new cDAC bug.
…otnet#129546) Previous commit incorrectly attributed the x86 stress flake to issues dotnet#129545/dotnet#129546. Those are x64-specific GC-stress crashes (assertion in MethodTable::Validate during managed exception unwind), not cDAC mismatches and not x86. Local x64 stress is clean. The actual x86 flake is a cDAC bug: EnumerateLiveSlots returns 0 live slots for some non-leaf managed frames (SplitName / GetNestedType / EventSource.Initialize / etc) where the runtime reports refs in callee-saved registers (ESI/EDI). Diagnostic instrumentation of HasFrameBeenUnwoundByAnyActiveException ruled it out as the cause (false in 100% of ~77k invocations during a reproduced flake). The most likely cause is one of: - partially-interruptible call-site IP matching off-by-one or wrong semantic (offset of return address vs call instruction); - x86 unwinder not propagating callee-saved register values across unwinds the same way native REGDISPLAY does (cDAC re-unwinds via Context.Clone().Unwind() each call; native caches pCallerContext); - some x86-specific frame-type handling difference in StackWalk_1. Documenting the corrected attribution and the open investigation in known-issues.md so this isn't conflated with a different issue. The actual root-cause investigation (comparing the IP cDAC and the runtime see for the failing frames) is a follow-up beyond this PR's scope.
The pre-PR cDAC port of partial-interruptible EBP-frame decoding had `curOffs = readUInt32(...)` for case 0xFB (huge call-pattern encoding), which OVERWRITES the current method-relative offset instead of advancing it by the encoded 32-bit code DELTA. Per the JIT encoder docs (src/coreclr/jit/gcencode.cpp ~line 2902): `0xFB [0BSD0bsd][32-bit code delta]` And the runtime decoder (src/coreclr/vm/gc_unwind_x86.inl:1041): `scanOffs += *dac_cast<PTR_DWORD>(table); table += sizeof(DWORD);` Confirmed both encoder and runtime use `+=`. cDAC should match. (Note: the dump tool `src/coreclr/gcdump/i386/gcdumpx86.cpp:650` and the R2R reflection reader `src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/x86/GcInfo.cs:303` both use `=` -- they appear to have inherited the same bug from some early port. Those need a separate fix; addressing only the cDAC copy here because that's what x86 stack walking depends on.) Exposed by x86 stress: long managed methods (e.g. EventSource cctors) that hit the huge call-pattern encoding had their decoded call-site table truncated to the first call after a 0xFB tag, and `EnumerateLiveSlots` then returned empty register sets at IPs past that point.
In the partial-EBP-frame decoder, when `val & 0x80 == 0 && val & 0x0F == 0` the byte encodes which callee-saved register holds the 'this' pointer at the next call site -- it is metadata, NOT a call entry itself (see native gc_unwind_x86.inl ~line 970: sets thisPtrReg, no call recorded). The pre-PR cDAC port created a spurious `GcTransitionCall` at the current curOffs with only the this-pointer register in CallRegisters. If the partial-EBP encoding emitted a this-pointer tag at the same curOffs as a real call site, the spurious entry would be appended to `Transitions[curOffs]` AFTER the real call entry. `EnumerateLiveSlots` iterates the list and overwrites `activeCallSite` each time -- so the spurious entry (with one reg) would win, losing the real call's full `CallRegisters` (potentially including ESI/EDI/EBX). This change discards the this-pointer metadata (we don't use thisPtrReg in cDAC's stress comparison) so it can't pollute the call-site table. NOTE: this is a correctness fix matching native semantics. It did not materially reduce the x86 stress flake rate -- the EventSource-cctor mismatch pattern persists, indicating at least one other root cause in either the GCInfo decoder or the stack walker's IP for those frames. That separate investigation is tracked in known-issues.md and the session plan.md.
The doc was missing notes added by recent x86 decoder commits: - ESP-frame pushedSize bias for untracked/VarPtr slots - Non-pointer push (IsPtr=false) stack-depth handling - IsParentOfFuncletStackFrame as an actual code path Added a new 'Encoding correctness notes (x86)' subsection capturing the four byte-stream subtleties caught during stress validation: - 0xFB huge-encoding code-delta is cumulative (+=, not =) - Partial-EBP this-pointer tag does not record a call entry - Partial-interrupt EBP-less call sites emit negative depth delta - 0xC0..0xCF is a callee-saved-only call entry distinct from 0xFD..0xFF Removed the obsolete 'all post-funclet x86 methods are EBP frames' claim from the deferred-edges section and clarified IPtrMask handling. > [!NOTE] > This commit was authored with assistance from GitHub Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The runtime's TransitionFrame::UpdateRegDisplay_Impl on x86 adds cbStackPop
(the callee-popped argument byte count) to the unwound caller SP
(src/coreclr/vm/i386/cgenx86.cpp:104-117). cDAC's base HandleTransitionFrame
omitted this adjustment, which on x86 propagates through subsequent
EBP-chain unwinds and causes the walker to skip frames, surfacing as the
~36% cdacstress flake reproduced on the EventSource cctor cascade.
x64/arm64 are unaffected because RtlVirtualUnwind resyncs each step from
PE unwind data; x86 has no such mechanism and accumulates the error.
Add an x86 override on X86FrameHandler.HandleTransitionFrame:
* PInvokeCalliFrame -> VASigCookie.SizeOfArgs (no MethodDesc available).
* Every other transition Frame -> X86ArgIterator.Compute on the Frame's
MethodDesc.
X86ArgIterator is a minimal x86-only signature walker patterned on the
native ArgIteratorTemplate::ForceSigWalk (callingconvention.h:2094-2200).
It mirrors only the cbStackPop computation - it does not enumerate
individual arg locations, classify HFA / SystemV structs, etc. The full
ArgIterator port is tracked separately.
Adds Data.PInvokeCalliFrame + cdac_data + datadescriptor entry so the
handler can read VASigCookie.SizeOfArgs.
Measured flake rate on the x86 cdacstress suite (25x runs):
baseline 9/25 (36%)
with this change ~0/25 (under measurement)
> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ArgIterator infrastructure
Squashes the 33 commits unique to the cdac-shared-argiterator branch onto
this branch as a single commit. Brings in the shared ArgIterator port +
the cDAC ICallingConvention contract (TryComputeArgGCRefMapBlob) +
ARGITER stress framework. Sets the foundation for replacing the x86-only
X86ArgIterator with the shared infrastructure.
Adjustments made on top of the raw squash:
* MethodTableFlags_1.WFLAGS_LOW: add the missing GenericsMask_SharedInst
(0x00000020) enum value that cdac-shared-argiterator's
IsSharedByGenericInstantiations property references.
* src/coreclr/tools/Common/CallingConvention/ArgIterator.cs: replace 10x
'new SomeStruct()' with 'default(SomeStruct)' to satisfy SA1129
(the file moved from tools/Common where the analyzer was disabled).
> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the X86ArgIterator helper added earlier in this PR with calls
into the shared CallingConvention contract that was squashed in from the
cdac-shared-argiterator branch. The net effect on x86 GCRefStress is
0 failures, 0 known issues (down from 0.12-5.66% known previously).
Changes
-------
* GCRefMapDecoder: add a second constructor that takes a host-side
byte[] in addition to the existing TargetPointer source. Lets us
decode blobs we synthesize via ICallingConvention.TryComputeArgGCRefMapBlob
using the same bit-stream reader as the R2R-resident blob path.
* GcScanner.PromoteCallerStack: replace the RecordDeferredFrame stub
with a real implementation. On Windows x86 / x64 it asks the shared
CallingConvention contract for the synthesized GCRefMap blob and
runs the same token-iteration loop as the R2R-backed
PromoteCallerStackUsingGCRefMap. On unsupported targets and on any
failure to synthesize a blob, falls back to RecordDeferredFrame so
the stress harness still buckets the resulting diff as a known
issue. Factored the token-iteration loop into a shared
EnumerateGCRefMapTokens helper.
* X86FrameHandler.HandleTransitionFrame: replaces the X86ArgIterator
cbStackPop calculation with TryComputeArgGCRefMapBlob + ReadStackPop
on the leading bytes. Same call site keeps the VASigCookie fast
path for PInvokeCalliFrame.
* X86ArgIterator.cs: deleted.
* CdacStressTests.cs: removed the hardcoded
if (arch == Architecture.X86) throw new SkipTestException for
GCRefStress_AllVerificationsPass. The previous skip comment said
"x86 has not been brought up yet" -- this PR brings it up.
* CdacStressTestBase.AssertAllPassed: on Windows x86 / x64, fail the
test if results.KnownIssues > 0. Every transition Frame's
caller-stack scan must succeed via the shared
ICallingConvention.TryComputeArgGCRefMapBlob path on those targets;
a non-zero count signals a regression in
CallingConvention_1.ComputeArgGCRefMapBlobCore.
Local validation (windows-x86 Checked)
-------------------------------------
GCRefStress, single full run:
Debuggee Total Pass Fail Known
------------- ------- ------- ------- -------
BasicAlloc 4,792 4,792 0 0
CallSignatures 5,240 5,240 0 0
Comprehensive 4,862 4,862 0 0
CrossModule 4,846 4,846 0 0
DeepStack 4,828 4,828 0 0
DynamicMethods 6,356 6,356 0 0
ExceptionHandling 4,834 4,834 0 0
Generics 4,806 4,806 0 0
MultiThread 4,878 4,878 0 0
PInvoke 4,818 4,818 0 0
StructScenarios 4,834 4,834 0 0
------------- -------
Total 55,094 55,094 0 0
cDAC unit tests: 2571 / 2571 pass.
> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7648cd9 to
8bed0e2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
This PR was authored with assistance from GitHub Copilot. Draft -- run CI to validate the cross-arch behavior of the new
PromoteCallerStackpath before promoting.Scope
This builds on top of two streams of work:
IGCInfoDecoder+IStackWalk.WalkStackReferencesenablement from PR [cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests #129547 (this branch was forked off the same line of decoder fixes and the GCInfo.md doc refresh).ICallingConventioncontract + ARGITER stress framework from @max-charlamb'scdac-shared-argiteratorwork-in-progress branch, which has been squashed into a single commit at the base of this branch (ff6e74e5b49).The new content in this PR is the commit on top of that squash (
7648cd977f8) that rewires the x86 stack-walker to use the shared infrastructure instead of an x86-specific port.Rewire details
X86ArgIterator.cs(262 LOC, x86-only port ofArgIteratorTemplate::CbStackPop)X86FrameHandler.HandleTransitionFrame->X86ArgIterator.Compute(target, md)to derivecbStackPopX86FrameHandler.HandleTransitionFrame->ICallingConvention.TryComputeArgGCRefMapBlob-> decode leadingReadStackPopfrom the blobGcScanner.PromoteCallerStackwas a stub that recorded a deferred frame and returned (caused 0.12-5.66% known-issue rate on x86)PromoteCallerStackon Windows x86 / x64 synthesizes the GCRefMap blob viaICallingConvention.TryComputeArgGCRefMapBloband runs the same token-iteration loop as the R2R-residentPromoteCallerStackUsingGCRefMap. Falls back toRecordDeferredFrameon unsupported targets / blob failures.EnumerateGCRefMapTokens(ref GCRefMapDecoder, TransitionBlock, GcScanContext)shared by bothPromoteCallerStackUsingGCRefMapandPromoteCallerStackGCRefMapDecoderonly accepted aTargetPointersource (used for reading R2R-resident blobs from the loaded image)GCRefMapDecoder(byte[] blob)so host-synthesized blobs can be decoded with the same bit-stream readerStress framework hardening
CdacStressTests.GCRefStress_AllVerificationsPass: removed the hardcodedif (arch == Architecture.X86) throw new SkipTestExceptionblock that the squashed branch left in place. The skip's comment was "x86 has not been brought up yet" -- this PR brings it up.CdacStressTestBase.AssertAllPassed: on Windows x86 / x64, fail the test ifresults.KnownIssues > 0. Every transition Frame's caller-stack scan must succeed via the sharedICallingConvention.TryComputeArgGCRefMapBlobpath on those targets. Other targets keep the existing tolerance because theirPromoteCallerStackstill falls back toRecordDeferredFrame.Local validation (windows-x86 Checked)
GCRefStress (
DOTNET_CdacStress=0x101), single full run:cDAC unit tests: 2571 / 2571 pass.
Known nondeterministic flake to investigate
One earlier run had 2
[FAIL]events on PInvoke (Frames insideSystem.RuntimeType.GetNestedType/EventSource.Initialize-- the EventSource startup cascade we have history with). The Frames were Frameless managed (not transition Frames), so this is inIGCInfoDecoder.EnumerateLiveSlots, not in the newPromoteCallerStackpath. 4 subsequent re-runs of PInvoke alone produced 0 failures each, so the flake is intermittent.Running CI on this draft to:
KnownIssues == 0assertion holds across the windows-x86 / windows-x64 stress legs.