Skip to content

[cdac] x86: rewire stack-walker to shared CallingConvention / GCRefMap#129858

Draft
max-charlamb wants to merge 33 commits into
dotnet:mainfrom
max-charlamb:x86-gc-info-w-argiterator
Draft

[cdac] x86: rewire stack-walker to shared CallingConvention / GCRefMap#129858
max-charlamb wants to merge 33 commits into
dotnet:mainfrom
max-charlamb:x86-gc-info-w-argiterator

Conversation

@max-charlamb

Copy link
Copy Markdown
Member

Note

This PR was authored with assistance from GitHub Copilot. Draft -- run CI to validate the cross-arch behavior of the new PromoteCallerStack path before promoting.

Scope

This builds on top of two streams of work:

  1. The x86 IGCInfoDecoder + IStackWalk.WalkStackReferences enablement from PR [cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests #129547 (this branch was forked off the same line of decoder fixes and the GCInfo.md doc refresh).
  2. The shared cDAC ICallingConvention contract + ARGITER stress framework from @max-charlamb's cdac-shared-argiterator work-in-progress branch, which has been squashed into a single commit at the base of this branch (ff6e74e5b49).

The new content in this PR is the commit on top of that squash (7648cd977f8) that rewires the x86 stack-walker to use the shared infrastructure instead of an x86-specific port.

Rewire details

Old New
X86ArgIterator.cs (262 LOC, x86-only port of ArgIteratorTemplate::CbStackPop) Deleted
X86FrameHandler.HandleTransitionFrame -> X86ArgIterator.Compute(target, md) to derive cbStackPop X86FrameHandler.HandleTransitionFrame -> ICallingConvention.TryComputeArgGCRefMapBlob -> decode leading ReadStackPop from the blob
GcScanner.PromoteCallerStack was a stub that recorded a deferred frame and returned (caused 0.12-5.66% known-issue rate on x86) PromoteCallerStack on Windows x86 / x64 synthesizes the GCRefMap blob via ICallingConvention.TryComputeArgGCRefMapBlob and runs the same token-iteration loop as the R2R-resident PromoteCallerStackUsingGCRefMap. Falls back to RecordDeferredFrame on unsupported targets / blob failures.
Token-iteration loop was duplicated in two callers Factored into EnumerateGCRefMapTokens(ref GCRefMapDecoder, TransitionBlock, GcScanContext) shared by both PromoteCallerStackUsingGCRefMap and PromoteCallerStack
GCRefMapDecoder only accepted a TargetPointer source (used for reading R2R-resident blobs from the loaded image) Added a second constructor GCRefMapDecoder(byte[] blob) so host-synthesized blobs can be decoded with the same bit-stream reader

Stress framework hardening

  • CdacStressTests.GCRefStress_AllVerificationsPass: removed the hardcoded if (arch == Architecture.X86) throw new SkipTestException block that the squashed branch left in place. The skip's comment was "x86 has not been brought up yet" -- this PR brings it up.
  • CdacStressTestBase.AssertAllPassed: on Windows x86 / x64, fail the test if results.KnownIssues > 0. Every transition Frame's caller-stack scan must succeed via the shared ICallingConvention.TryComputeArgGCRefMapBlob path on those targets. Other targets keep the existing tolerance because their PromoteCallerStack still falls back to RecordDeferredFrame.

Local validation (windows-x86 Checked)

GCRefStress (DOTNET_CdacStress=0x101), single full run:

Debuggee Total Pass Fail Known
BasicAlloc 4,792 4,792 0 0
CallSignatures 5,240 5,240 0 0
Comprehensive 4,862 4,862 0 0
CrossModule 4,846 4,846 0 0
DeepStack 4,828 4,828 0 0
DynamicMethods 6,356 6,356 0 0
ExceptionHandling 4,834 4,834 0 0
Generics 4,806 4,806 0 0
MultiThread 4,878 4,878 0 0
PInvoke 4,818 4,818 0 0
StructScenarios 4,834 4,834 0 0
Total 55,094 55,094 0 0

cDAC unit tests: 2571 / 2571 pass.

Known nondeterministic flake to investigate

One earlier run had 2 [FAIL] events on PInvoke (Frames inside System.RuntimeType.GetNestedType / EventSource.Initialize -- the EventSource startup cascade we have history with). The Frames were Frameless managed (not transition Frames), so this is in IGCInfoDecoder.EnumerateLiveSlots, not in the new PromoteCallerStack path. 4 subsequent re-runs of PInvoke alone produced 0 failures each, so the flake is intermittent.

Running CI on this draft to:

  1. Confirm the strict KnownIssues == 0 assertion holds across the windows-x86 / windows-x64 stress legs.
  2. Confirm non-x86/x64 stress legs still tolerate the deferred-frame fallback path.
  3. Get a wider sample of PInvoke / EventSource cascade runs to characterise the flake rate (we'll triage it as a follow-up if it shows up in CI).

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the cDAC stack-walking and stress infrastructure by wiring Windows x86 transition-frame caller scanning to the shared ICallingConvention/GCRefMap machinery, and by extending the in-proc cdacstress harness to support multiple sub-checks (GCREFS + ARGITER) with richer test/debuggee coverage.

Changes:

  • Reworked caller-stack promotion to optionally synthesize and decode GCRefMap blobs via a new ICallingConvention.TryComputeArgGCRefMapBlob path and shared token enumeration.
  • Extended cdacstress configuration/telemetry (flag layout, [GC_STATS] / [ARG_STATS]) and updated managed stress harness parsing/assertions; added multiple new stress debuggees.
  • Refactored/shared calling convention code (ArgIterator/TransitionBlock/ITypeHandle) for reuse by cDAC and ReadyToRun tooling.

Reviewed changes

Copilot reviewed 60 out of 60 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/StressTests/RunStressTests.ps1 Updates stress flag documentation/defaults to new WHERE/WHAT/MODIFIER layout.
src/native/managed/cdac/tests/StressTests/README.md Documents new flag regions, sub-check markers, and updated defaults.
src/native/managed/cdac/tests/StressTests/known-issues.md Updates defaults and adds documentation for an intermittent x86 flake.
src/native/managed/cdac/tests/StressTests/Debuggees/VarArgs/VarArgs.csproj Adds VarArgs debuggee project file.
src/native/managed/cdac/tests/StressTests/Debuggees/VarArgs/Program.cs Adds varargs-focused ARGITER debuggee.
src/native/managed/cdac/tests/StressTests/Debuggees/StructScenarios/Program.cs Adds nested-struct scenario to stress ArgIterator/GCDesc paths.
src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Program.cs Adds cross-module signature/type-resolution debuggee.
src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Lib/Types.cs Adds library types used by CrossModule debuggee to force cross-module resolution.
src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/Lib/CrossModuleLib.csproj Adds library project for CrossModule debuggee.
src/native/managed/cdac/tests/StressTests/Debuggees/CrossModule/CrossModule.csproj Adds main CrossModule debuggee project file + ProjectReference wiring.
src/native/managed/cdac/tests/StressTests/Debuggees/CallSignatures/Program.cs Adds comprehensive ARGITER signature-shape coverage debuggee.
src/native/managed/cdac/tests/StressTests/Debuggees/CallSignatures/CallSignatures.csproj Adds CallSignatures debuggee project file and warning suppressions.
src/native/managed/cdac/tests/StressTests/CdacStressTests.cs Replaces prior basic test entrypoint with expanded debuggee catalog and GCREFS/ARGITER theories.
src/native/managed/cdac/tests/StressTests/CdacStressTestBase.cs Adds per-mode stress runner and stricter assertions/target detection.
src/native/managed/cdac/tests/StressTests/CdacStressResults.cs Adds parsing of [GC_STATS] / [ARG_STATS] and captures ARGITER divergence lines.
src/native/managed/cdac/tests/StressTests/BasicCdacStressTests.cs Removes the older stress test harness class (superseded by CdacStressTests).
src/native/managed/cdac/tests/DumpTests/StackReferenceDumpTests.cs Removes x86 skip now that x86 GCInfo decoding is supported.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/StressTestApi/CdacStressApi.cs Adds legacy DAC request handlers for cdacstress private opcodes, including ARGITER blob request.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.IXCLRDataProcess.cs Routes stress private requests to the new CdacStressApi helper.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/RuntimeTypeSystemHelpers/MethodTableFlags_1.cs Extends MethodTable flags helpers (shared instantiation + byreflike).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Microsoft.Diagnostics.DataContractReader.Contracts.csproj Links in shared CallingConvention/ArgIterator sources into Contracts build.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/DataType.cs Adds PInvokeCalliFrame to the cDAC type discriminator enum.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/TransitionBlock.cs Adds OffsetOfArgs field address to support correct x86 GCRefMap pos->addr mapping.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/PInvokeCalliFrame.cs Adds cDAC data type to expose VASigCookiePtr for PInvokeCalliFrame.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/CoreCLRContracts.cs Registers the new ICallingConvention contract implementation.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs Swallows NotImplementedException per-frame to allow partial stack-walk results.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcScanner.cs Reworks caller-stack scanning to use synthesized GCRefMap blobs on Windows x86/x64; factors shared token iteration; fixes x86 DynamicHelperFrame arg-reg offsets; corrects x86 pos->addr mapping.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs Adds a byte[]-backed decoder constructor for host-synthesized blobs.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/FrameHandling/X86FrameHandler.cs Computes x86 transition-frame caller SP by decoding stack-pop prefix (or VASigCookie for PInvokeCalliFrame).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs Implements new RTS helpers (byreflike, unboxing stub, approx field type handle).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCTransition.cs Fixes ctor assignments for IsThis/Iptr fields.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCArgTable.cs Fixes several x86 arg table decoding issues and removes debug prints; adjusts stack-depth transition deltas.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/CallingConvention/CdacTypeHandle.cs Adds cDAC-backed ITypeHandle implementation for shared ArgIterator.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/CallingConvention/ArgumentLocation.cs Adds internal argument-location record for encoder bookkeeping.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs Adds new contract APIs (byreflike, unboxing stub, approx field type handle).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/ICallingConvention.cs Introduces new calling-convention contract + public surface.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs Exposes CallingConvention contract property on registry.
src/coreclr/vm/frames.h Exposes PInvokeCalliFrame::m_pVASigCookie via cdac_data specialization.
src/coreclr/vm/datadescriptor/datadescriptor.inc Adds PInvokeCalliFrame descriptor and registers CallingConvention global contract.
src/coreclr/vm/cdacstress.cpp Adds WHAT/WHERE/MODIFIER flag regions, ARGITER sub-check (runtime-vs-cDAC blob comparison), and emits [GC_STATS]/[ARG_STATS].
src/coreclr/tools/Common/CallingConvention/TransitionBlock.cs Moves TransitionBlock into Internal.CallingConvention and refactors to use ITypeHandle.
src/coreclr/tools/Common/CallingConvention/SystemVAmd64PassingDescriptor.cs Adds extracted SystemV descriptor types for shared use.
src/coreclr/tools/Common/CallingConvention/ITypeHandle.cs Adds shared type abstraction for calling convention computation.
src/coreclr/tools/Common/CallingConvention/FpStructInRegistersInfo.cs Adds extracted RISC-V/LoongArch FP struct classification types for shared use.
src/coreclr/tools/Common/CallingConvention/ArgIterator.cs Refactors ArgIterator into Internal.CallingConvention and reworks it to use ITypeHandle + shared TransitionBlock.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/WasmLowering.ReadyToRun.cs Updates Wasm lowering code to use new ITypeHandle shape.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/ILCompiler.ReadyToRun.csproj Links in shared CallingConvention sources and adjusts ReadyToRun file list.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmR2RToInterpreterThunkNode.cs Updates using/aliasing for new ArgIterator namespace.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmInterpreterToR2RThunkNode.cs Updates using/aliasing for new ArgIterator namespace.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/WasmImportThunk.cs Updates using/aliasing for new ArgIterator namespace.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/TypeHandle.cs Adds crossgen2-backed ITypeHandle implementation.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/GCRefMapBuilder.cs Updates builder to use shared TransitionBlock/ArgIterator/ITypeHandle.
src/coreclr/inc/dacprivate.h Adds new private request opcode and request struct for ARGITER blob retrieval (CDAC_STRESS).
src/coreclr/inc/clrconfigvalues.h Simplifies CdacStress config description now that layout is more complex.
eng/pipelines/runtime-diagnostics.yml Adds windows_x86 to runtime-diagnostics pipeline parameter default list.
docs/design/datacontracts/RuntimeTypeSystem.md Documents new RTS APIs and flags.
docs/design/datacontracts/GCInfo.md Updates x86 behavior documentation and adds x86 specifics section.
docs/design/datacontracts/CallingConvention.md Adds new CallingConvention contract design doc.

Comment on lines +8 to +14
public interface ICallingConvention : IContract
{
static string IContract.Name => nameof(CallingConvention);

bool TryComputeArgGCRefMapBlob(MethodDescHandle methodDesc, out byte[] blob)
=> throw new NotImplementedException();
}
Comment on lines +65 to +66
if (inBuffer is null || inSize < (uint)Unsafe.SizeOf<DacStressArgGCRefMapRequest>())
return HResults.E_INVALIDARG;
Comment on lines +1 to +9
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<LangVersion>latest</LangVersion>
<!-- This debuggee deliberately defines many "test scaffolding" types and
enums where StyleCop's one-value-per-line / no-unassigned-fields
conventions don't add value. Extend the parent NoWarn. -->
<NoWarn>$(NoWarn);SA1136;CS0649</NoWarn>
</PropertyGroup>
</Project>
Comment on lines +62 to +86
public int GetSize()
{
// Constructed pointer/array/byref args always occupy one TADDR slot
// in the transition block (the actual pointee is reached via the
// pointer value, not stored inline). When _kindOverride is set, the
// underlying TypeHandle may be null (uncached PTR), so GetBaseSize
// would fault.
if (_kindOverride is CdacCorElementType.Ptr
or CdacCorElementType.Byref
or CdacCorElementType.SzArray
or CdacCorElementType.Array)
{
return PointerSize;
}

if (_typeHandle.IsNull)
return 0;

// GetBaseSize returns the full object size including object header and padding.
// For value types used in calling convention, we need the unboxed size.
// BaseSize = ObjHeader + MethodTable* + unboxed fields, aligned to pointer size.
// Unboxed size = BaseSize - 2 * PointerSize (subtract ObjHeader + MT pointer).
uint baseSize = Rts.GetBaseSize(_typeHandle);
return (int)(baseSize - (uint)(2 * PointerSize));
}
Max Charlamb and others added 24 commits June 25, 2026 11:47
…ots stackref tests

Builds on the partial x86 IGCInfo support added in dotnet#129456 by porting the
remaining decoder pieces required for GC-root scanning on x86, so that
`IStackWalk.WalkStackReferences` returns live frame slots on x86 cDAC.

The x86 GC info uses the legacy bit-packed `InfoHdr` byte-stream encoding
(`src/coreclr/vm/gc_unwind_x86.inl`, `src/coreclr/inc/gcdecoder.cpp`)
instead of the modern `GcInfoDecoder` shared by other architectures, so
the implementation lives entirely on the existing `X86GCInfo` decoder
under `Contracts/GCInfo/X86/`.

Changes
-------

* `X86GCInfo`: add `UntrackedSlots` lazy property +
  `DecodeUntrackedSlots()` -- delta-decoded signed varints with the
  double-align-frame rebase from `gc_unwind_x86.inl:3467`.
* `X86GCInfo`: add `VarPtrLifetimes` lazy property +
  `DecodeVarPtrLifetimes()` -- triplets of (varOffs, begOffs delta,
  endOffs delta) for EBP-frame tracked locals.
* Two new public record types `UntrackedSlot` and `VarPtrLifetime`
  capture the decoded entries.
* `IsCodeOffsetInProlog` / `IsCodeOffsetInEpilog` helpers
  (offset-parameterised, so EnumerateLiveSlots can answer for any
  instruction offset without re-constructing X86GCInfo).
* `RegMaskToRegisterNumber` helper maps the single-bit `RegMask`
  flags-enum values to the x86 ModRM register numbers used by
  `X86Context.TryReadRegister` and `LiveSlot.RegisterNumber`.
* Implement `IGCInfoDecoder.EnumerateLiveSlots(uint offset, options)`:
  early-return empty in prolog/epilog (or aborted+non-interruptible),
  emit untracked locals (suppressed for filter funclets), emit VarPtr
  lifetimes covering `offset`, walk `Transitions` up to `offset`
  accumulating live registers + pushed pointer args, and emit a
  partially-interruptible `GcTransitionCall` exactly at `offset`.
* Flip `IGCInfoDecoder.GetSizeOfStackParameterArea` from
  `NotSupportedException` to `return 0` for x86 -- x86 has no
  separate outgoing-argument scratch area; per-offset transitions
  report pushed args directly, so the GcScanner scratch-area filter is
  a no-op (correct).
* Remove the `[SkipOnArch("x86", "GCInfo decoder does not support
  x86")]` markers on `GCRoots_WalkStackReferences_FindsRefs` and
  `GCRoots_RefsPointToValidObjects`.
* `DumpTests.targets`: add optional `DebuggeeFilter=<Name>` to
  restrict `GenerateAllDumps` to a single debuggee. Useful for
  iterative local x86 work where some other debuggee's publish may
  fail.
* `docs/design/datacontracts/GCInfo.md`: enumerate which
  `IGCInfoDecoder` APIs are wired up on x86.

Out of scope (deferred)
-----------------------

* `GetInterruptibleRanges` for x86 -- the only consumer is the
  catch-handler PC override in `StackWalk_1`; no x86-relevant
  scenarios today.
* "this"-pointer special-case reporting for synchronized methods
  (VarPtr 0x2 bit currently masked out).
* IPtrMask interior-pointer bitmaps for pushed args (uses the simpler
  per-push `Iptr` flag).
* Funclet handling beyond the existing `IsParentOfFuncletStackFrame`
  caller-side early-skip.
* Finer `IsActiveFrame` register filter precision.

Validation
----------

* All 2525 cDAC unit tests pass.
* The two unblocked `GCRoots_*` tests pass against a freshly
  generated x86 GCRoots dump.
* Broader `DumpTests` x86 sweep: 34 pass / 46 fail / 830 skip --
  net +2 vs. before this change (the two GCRoots tests), zero
  regressions. The 46 pre-existing failures are all unrelated to
  GCInfo (`ThreadDumpTests` / `ComWrappersDumpTests` /
  `RuntimeInfoDumpTests` / `WorkstationGCDumpTests` and similar).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intro paragraph was getting wordy with the per-API status; pull that
content out into a new "x86 specifics" section at the end of the file
with a table covering supported/not-implemented APIs and a deferred-edges
list. Intro now just notes that x86 is partially supported and links to
the section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Context: x86 has used the funclet EH model since PR dotnet#115957 / dotnet#122872
(I had previously assumed otherwise and documented this API as
intentionally not implemented). Catch funclet unwinding does call into
`IGCInfo.GetInterruptibleRanges` via the parent-frame PC override
path in `StackWalk_1.WalkStackReferences`, so throwing
`NotSupportedException` is a real correctness gap (silently swallowed
by the per-frame try/catch and producing missed parent-frame GC roots).

Implementation
--------------

Match the semantics of the native x86 walker (`EnumGcRefsX86` in
`gc_unwind_x86.inl`):

* Fully interruptible (`Header.Interruptible == true`): emit one range
  per gap between the prolog end and each epilog start, plus a final
  range from the last epilog end to `MethodSize`. This mirrors the
  native walker's prolog/epilog short-circuit return at line 3091.
* Partially interruptible: walk `Transitions` and emit a single-byte
  `(offset, offset + 1)` range for each `GcTransitionCall`. Per the
  native partial-interrupt encoding doc (`gc_unwind_x86.inl:1066+`),
  call sites are the only GC-safe points in this mode -- the
  intervening `GcTransitionRegister` / `GcTransitionPointer` /
  `StackDepthTransition` / `IPtrMask` / `CalleeSavedRegister`
  events are all bookkeeping, not safe points.

Docs
----

Update `docs/design/datacontracts/GCInfo.md` x86 specifics:

* Move `GetInterruptibleRanges` from "Not implemented" to
  "Implemented" in the supported APIs table.
* Remove the false claim that x86 has no funclets / no x86-relevant
  scenarios for this API.
* Reference PR dotnet#115957 (Enable new exception handling on win-x86) for
  context on the EH model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pure file rename (git mv) -- the class was renamed `GCInfo` -> `X86GCInfo`
in dotnet#129456 to avoid collision with the empty IGCInfo fallback struct, but
the file kept the old name. Bring the filename in line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed bits

I had been masking out the 0x2 bit for VarPtr lifetimes, claiming it
meant "this" pointer not pinned for tracked locals. That's wrong on
modern x86: the native VarPtr loop in gc_unwind_x86.inl:3610-3613
explicitly says

    // First  Bit : byref
    // Second Bit : pinned
    // Both bits are valid
    flags |= lowBits;

The this_OFFSET_FLAG = 0x2 interpretation in gcinfo.h was scoped to the
legacy JIT32_ENCODER on x86 without funclets, which has been gone since
dotnet#115957 enabled funclet EH on win-x86 (and dotnet#122872 removed the rest).

Pass LowBits straight through into LiveSlot.GcFlags for VarPtr entries,
matching what we already do for untracked slots. Update the
VarPtrLifetime LowBits xmldoc to drop the wrong "this"-pointer note.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GcTransitionRegister and GcTransitionPointer constructors took `isThis`
and `iptr` parameters but silently dropped them on the floor. The
default-false properties survived past every `0xBF` interior-pointer
prefix, every `0xBC` this-pointer prefix, and every byref CallRegister,
so the LIVE state walker's iptr accumulator was always 0. Stress tests
on Windows x86 saw ~30 register-resident interior-pointer mismatches per
debuggee in CoreLib code (e.g. EventSource.DefineEventPipeEvents) where
RT reported Reg=EDI Flags=0x1 (interior) and cDAC reported Flags=0x0.

Add the missing `IsThis = isThis; Iptr = iptr;` assignments. Drops the
BasicAlloc x86 stress mismatch count from 30 to 0 and the full-suite
total from 316 to 14 (99.984% match).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native gcdumpx86.cpp:361 declares `curOffs = 0` once *outside* the
partial-interrupt EBP-frame walk loop and accumulates code-offset
deltas across iterations. cDAC's `GetTransitionsEbpFrame` declared it
*inside* the `while (true)` loop body, so it was reset to 0 every
iteration -- causing all partial-interrupt EBP-frame call-site
transitions to be emitted at small per-iteration deltas instead of
true cumulative method offsets.

Effect: callee-saved register liveness reported by RT on real call
sites in CoreLib R2R'd code (where the JIT typically keeps GC refs
in EBX/ESI/EDI across calls) was missed by cDAC's EnumerateLiveSlots,
because activeCallSite never matched at the queried offset. Stress
tests on Windows x86 saw ~3K such under-reports per BasicAlloc run
(1480 EDI + 1284 EBX + 476 ESI), concentrated in
System.Diagnostics.Tracing.EventSource.* methods.

Move `uint curOffs = 0;` outside the loop to mirror native, dropping
the BasicAlloc x86 stress mismatch count from 1742 to 30 (98.0% ->
99.97% match).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four related correctness fixes in X86GCInfo's `EnumerateLiveSlots` and
`EnumerateTransitionLiveSlots`, mirroring native EnumGcRefsX86 in
gc_unwind_x86.inl. These were uncovered by running the cDAC GC stress
tests against Windows x86 corerun and comparing cDAC's slot output
against the runtime's at every managed allocation.

1. **Pushed-pointer-arg address calculation** -- Native at
   gc_unwind_x86.inl:3262 addresses bit `i` of the args mask at
   `pPendingArgFirst - i*sizeof(DWORD)`, where pPendingArgFirst is the
   FIRST-pushed-arg address (highest among pushed args). cDAC was
   storing pushed pointers keyed by `-depthSlots*4` and emitting them
   as SP_REL slots with the negative offset, resolving to addresses
   *below* the call-site SP -- impossible for live args. Switched to
   storing by push-index (depth-at-push-time, 0-indexed) and
   translating to positive SP-relative offset
   `(finalDepth - 1 - pushIndex) * 4` at emit time.

2. **ESP-only push/pop semantics** -- GcArgTable encodes ESP push/pop
   bytes as `GcTransitionRegister` with `RegMask.ESP` and Action.PUSH/
   POP, but those mean "stack-depth tracking" (non-pointer args), not
   "pointer push". cDAC was treating them as pointer pushes (adding
   phantom entries to pushedPtrs) and was filtering them through the
   regOffset gate (which is for register-liveness LIVE/DEAD events
   only, not arg-stream events). Now ESP-only pushes only advance
   depth, and the regOffset gate fires only for `Action.LIVE`/
   `Action.DEAD`.

3. **Register-state offset on non-leaf frames** -- Native EnumGcRefsX86
   uses `curOffsRegs = curOffs - 1` for non-active stack frames
   because register liveness can change across calls
   (gc_unwind_x86.inl:3149+). Mirror that: register-liveness LIVE/DEAD
   events at offset > regOffset are skipped on non-active frames.

4. **VarPtr lifetimes on EBP frames** -- Native at
   gc_unwind_x86.inl:3567-3573 negates the encoded VarPtr stack offset
   for EBP frames (the encoded value is positive but means
   EBP-relative-negative for locals). cDAC was leaving it positive,
   reporting locals at the wrong addresses. Also use `curOffs-1` for
   the lifetime-range check on non-active frames, mirroring
   gc_unwind_x86.inl:3540-3548 (a variable could be dead at the
   return address if the call was the last instruction of a try and
   the return jumps to a catch handler).

5. **Callee-trashed register filtering** -- On non-active frames the
   callee will have overwritten EAX/ECX/EDX, so any GC refs they held
   at the call site are stale. Native gates these via the
   `ActiveStackFrame` flag at gc_unwind_x86.inl:3189-3199; mirror that
   in EnumerateTransitionLiveSlots.

Combined effect on Windows x86 stress (BasicAlloc, Checked):
  Before:  59,514 matched / 27,702 mismatched (68%)
  After:    87,778 matched /     0 mismatched (100%)

Across the full 9-debuggee suite: 43,771 PASS / 14 FAIL (99.97%).
The remaining 14 are filter-funclet / state-context edge cases tracked
separately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rgMask call sites

Two follow-up correctness fixes for residual stress-test mismatches.

1. **ParentOfFuncletStackFrame early-return** -- Native EnumGcRefsX86
   (gc_unwind_x86.inl:3039) returns without reporting any GC refs when
   the stack walker has flagged the frame as a parent whose locals are
   already being reported via a funclet (e.g. a catch handler). cDAC's
   `StackWalk_1` does set `GcSlotEnumerationOptions.IsParentOfFuncletStackFrame`
   correctly via `gcFrame.ShouldParentToFuncletSkipReportingGCReferences`,
   but `X86GCInfo.EnumerateLiveSlots` was ignoring it and emitting the
   parent's slots a second time. This caused stress mismatches in
   exception-handling scenarios where a catch funclet appears alongside
   its parent (e.g. `Program.NestedExceptionScenario`): cDAC reported 9
   refs on the parent frame while RT correctly reported 0.

2. **Call-site `ArgMask` iteration** -- For partially-interruptible call
   sites, GcArgTable populates either `GcTransitionCall.PtrArgs` (huge
   `0xFB` encoding with explicit per-pointer offsets) or
   `GcTransitionCall.ArgMask` / `IArgs` (tiny / small / medium / large
   encodings with a bitmap of pushed-pointer slots). cDAC was only
   iterating `PtrArgs`, silently dropping every call site that used the
   bitmap form. Native scanArgRegTable (gc_unwind_x86.inl:3373-3402)
   walks the bitmap low-to-high with `argAddr = ESP + i*sizeof(DWORD)`;
   mirror that. Affected `System.Diagnostics.Tracing.ManifestBuilder.
   CreateManifestString` and similar large CoreLib methods where the
   missing slot was always at `SP+0` (the lowest-bit pushed pointer).

Combined with the prior commits in this branch, x86 cDAC GC root
walking now reports identical results to the runtime across all 9
stress debuggees:

  Pre-session baseline: 87,294 frames / 27,702 mismatched (68% match)
  This commit:         803,556 frames /      0 mismatched (100%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LiveSlots

Tighten comments in `X86GCInfo.cs` and `GCArgTable.cs`:
- Replace `gc_unwind_x86.inl:NNNN` line-number citations with file +
  function-name references (line numbers churn). Cite `EnumGcRefsX86`,
  `scanArgRegTableI`, `scanArgRegTable`, etc.
- Compact the per-block "why" comments to a single concise sentence,
  removing mechanical "we do X because native does X" duplication.
- Tag the regOffset/curOffsRegs note in `EnumerateTransitionLiveSlots`
  to point at the equivalent native field rather than line numbers.

Document the now-functional `EnumerateLiveSlots` in
`docs/design/datacontracts/GCInfo.md`. Update the Supported APIs table
to reflect that it is implemented and stress-validated, and replace the
stale Deferred edges list (now-resolved items dropped, remaining items
reframed as "not exercised by current x86 codegen / not stressed").

No behavioral changes; full stress suite still reports 0 mismatches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts commit cb82b21 to keep the file name as `GCInfo.cs`. Even
though the class inside is `X86GCInfo` (renamed in dotnet#129456 to avoid
colliding with the empty `Contracts.GCInfo` IGCInfo fallback struct),
the file name churn shows up as a delete+add in the PR diff, which is
harder to review. The C# class name does not need to match the file
name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…re now implemented

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ction

Drop the per-API x86 callouts in the IGCInfoHandle interface comments
and the intro cross-reference. All x86 detail now lives in the
dedicated 'x86 specifics' section, minimizing the diff footprint
outside that section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All APIs are now implemented on x86; the table was just noise. Distill
the two truly x86-specific behavior notes (GetSizeOfStackParameterArea
returns 0, GetInterruptibleRanges encoding) into a short bullet list in
the intro of the x86 specifics section. The EnumerateLiveSlots
behavior subsection is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Was added as a local convenience to restrict GenerateAllDumps to a
single debuggee during iterative x86 work. Not needed in the shipping
infrastructure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Narrow visibility of decoder-internal types and properties:
  `UntrackedSlot`, `VarPtrLifetime`, `UntrackedSlots`, `VarPtrLifetimes`
  are only consumed by `EnumerateLiveSlots`, so make them `internal`
  (was `public`) to avoid expanding the contracts assembly surface.
- Replace hardcoded `* 4` with `_target.PointerSize` in the pushed-arg
  emit and the ArgMask bitmap iteration -- consistent with the rest of
  the decoder.
- Fix stale VarPtr comment that claimed `0x2` was "this NOT pinned".
  Modern x86 JIT uses `0x2` for pinned (matching `LiveSlot.GcFlags`);
  the legacy `this` interpretation only applied to JIT32_ENCODER which
  is no longer in use. The encoded bits already passed through
  correctly -- this is a comment fix only.
- Implement `GcSlotEnumerationOptions.ReportFPBasedSlotsOnly` on x86 as
  a post-filter that drops register slots and non-frame-relative stack
  slots, matching `GCInfoDecoder.ReportSlot` semantics on other arches.
  Update GCInfo.md to mention the filter under the x86 specifics
  section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…errupt EBP-less call sites

Native scanArgRegTable (gc_unwind_x86.inl) decrements stackDepth by
callArgCnt at every call encoding in the EBP-less partial-interrupt
walker. cDAC was emitting the delta as positive, which the consumer
(EnumerateLiveSlots) was then *adding* to depthSlots -- the opposite
direction. Negate at the four StackDepthTransition emission sites in
GetTransitionsNoEbp so the consumer's `depthSlots += delta` matches
native semantics.

In practice this path is unreachable on current x86 codegen because all
post-funclet x86 methods are EBP frames (PR dotnet#115957), and the cDAC GC
stress suite still reports 0 mismatches across all 9 debuggees -- but
the bug was real and the fix is mechanical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… pushes

GcArgTable's GetTransitionsFullyInterruptible emits non-pointer arg
pushes (encoding 0xB0..0xB7) as GcTransitionPointer with IsPtr=false.
EnumerateLiveSlots was unconditionally recording every PUSH into
pushedPtrs, which would incorrectly report non-pointer slots as GC
roots. Skip the pushedPtrs entry when IsPtr is false (still advance
depthSlots so positional offsets stay correct), matching the
RegMask.ESP path in ApplyRegisterTransition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…est platforms

The x86 EnumerateLiveSlots / GetInterruptibleRanges implementation in
this PR is validated locally to match EnumGcRefsX86 across all 9 stress
debuggees (0 mismatches, 803K frames). Enable CI coverage by including
windows_x86 in cdacStressPlatforms, matching the existing windows_x86
entries in cdacDumpPlatforms and cdacXPlatDumpPlatforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…by pushedSize

Native EnumGcRefsX86 computes `argBase = ESP + pushedSize` for ESP frames
when iterating the untracked locals table (and VarPtr if present), so an
encoded `stkOffs` resolves to `ptrAddr = ESP + pushedSize + stkOffs`.
The SP-relative offset cDAC needs to emit is therefore `pushedSize +
stkOffs`, not bare `stkOffs`.

Compute the pushed-arg size at the queried instruction offset and add it
to non-EBP-relative untracked / VarPtr slot offsets. EBP-frame slots are
already FRAMEREG-relative and need no adjustment.

Refactor `CalculatePushedArgSize()` to delegate to a new
`CalculatePushedArgSizeAt(uint codeOffset)` so both the existing
`RelativeOffset`-bound `PushedArgSize` property and the new
`EnumerateLiveSlots(instructionOffset, ...)` consumer share the walk.

Unreachable on current x86 codegen (all post-funclet x86 methods use
EBP frames, see dotnet#115957), but the bug was real and would have produced
wrong addresses on any future ESP-frame methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native DynamicHelperFrame::GcScanRoots_Impl (frames.cpp) uses
`offsetof(ArgumentRegisters, ECX)` for ObjectArg and
`offsetof(ArgumentRegisters, EDX)` for ObjectArg2 on x86, where the
ArgumentRegisters struct is declared in REVERSED calling-convention
order via ENUM_ARGUMENT_REGISTERS_BACKWARD (cgencpu.h: { EDX, ECX } so
EDX is at offset 0 and ECX at offset +PointerSize). On every other
architecture the struct lays the registers out in forward call order
(first arg at offset 0, second at offset +PointerSize), and the
non-x86 native path adds `sizeof(TADDR)` for ObjectArg2.

cDAC's ScanDynamicHelperFrame assumed the uniform non-x86 layout, so on
x86 it reported ObjectArg at EDX's location and ObjectArg2 at ECX's
location -- the addresses came out 4 bytes off from the runtime, which
surfaced as flaky stress-test mismatches once we added windows_x86 to
cdacStressPlatforms (the previous cDAC stress matrix excluded x86, so
this code path was never exercised against the live runtime).

Detect TARGET_X86 via RuntimeInfo.GetTargetArchitecture and swap the
two offsets to match native semantics. Verified across 5 BasicAlloc
stress runs: 0 mismatches each.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- VarPtrLifetimes XML doc + EnumerateLiveSlots comment: drop the
  incorrect claim that the VarPtr table is "only EBP frames / empty for
  ESP frames". Native processes the table on both frame kinds; only the
  per-entry `stkOffs` negation is EBP-frame-specific.
- GetInterruptibleRanges: include `epilogStart` in the preceding
  interruptible range (end at `eStart + 1`, clamped to method size).
  IsCodeOffsetInEpilog uses a strict `>` check, so epilogStart itself
  is NOT in the epilog and EnumerateLiveSlots will happily report slots
  there -- the range list now matches.
- CalculatePushedArgSizeAt: clamp `depth` to >= 0 before casting to
  uint. StackDepthTransition can carry negative deltas (call-site arg
  pops in the partial-interrupt ESP-frame encoding) and a transient
  underflow would otherwise wrap to a huge unsigned, corrupting
  espBias and the pushed-arg offset calculation.
- Cache sorted transition offsets in a `Lazy<ImmutableArray<int>>`
  instead of re-running `Transitions.Keys.OrderBy(...)` on every
  EnumerateLiveSlots / CalculatePushedArgSizeAt / GetInterruptibleRanges
  call. EnumerateLiveSlots fires once per managed frame during stack
  walking, so this is a hot path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues prevented x86 stress from passing in CI (build 1473187):

1. Leftover Console.WriteLine debug output in GCArgTable.cs flooded the
   Helix log with thousands of "CallArgCount/CallPndTabCnt/lastSkip/stkOffs"
   lines and made test triage impossible. Removed all four diagnostic
   writes plus their now-dead locals (stkOffs/lowBit, lastSkip+param,
   callPndTabSize, callPndTab read-without-use).

2. CalculatePushedArgSizeAt threw "Unsupported gc transition type"
   (InvalidOperationException, HRESULT 0x80131509) whenever an ESP-frame
   walk hit a CalleeSavedRegister transition (emitted by the partial-
   interrupt no-EBP decoder, case 7 / 0x04). Native scanArgRegTable
   treats this tag as informational (no effect on outgoing-arg depth);
   the cDAC switch was missing the case so every ESP-frame stack with
   a callee-saved-reg tag aborted GetStackReferences with hr=0x80131509.
   This was the source of 18-70 [FAIL] entries per debuggee in CI.

3. AddressFromGCRefMapPos used the same forward-from-FirstGCRefMapSlot
   formula on every architecture, but x86's TransitionBlock lays out
   its two argument registers (EDX/ECX) backward via
   ENUM_ARGUMENT_REGISTERS_BACKWARD. Native OffsetFromGCRefMapPos
   (frames.cpp) reverses pos 0/1 within the arg-regs area on x86.
   Without this fix, ExternalMethodFrame GCRefMap-driven scans reported
   the wrong arg-reg slot for the first two positions. Stack-arg
   positions (pos >= 2) remain forward, matching native.

Local PInvoke debuggee under DOTNET_CdacStress=0x001:
  Before fix #1+#2: 70 fail / 64926 matched / 0 mismatched (every
    [FAIL] was an InvalidOperationException, not a real mismatch).
  After #1+#2:     2 fail / 66250 matched / 2 mismatched.
  After all three: 2 fail (remaining is a separate x86 GCRefMap
    decode bug in pos >= 2 stack args, tracked as follow-up).

5 of 9 debuggees now report 0 failures and 0 mismatches.
Max Charlamb and others added 9 commits June 25, 2026 11:47
After enabling stack walking on x86 (PR dotnet#129547), every ExternalMethodFrame
with stack arguments produced a constant 20-byte (5-slot) offset error in
cDAC's reported GC roots vs the runtime oracle. Root cause:

x86's TransitionBlock layout is:
    +0  m_argumentRegisters        (8 bytes, EDX/ECX, BACKWARD-ordered)
    +8  m_calleeSavedRegisters     (16 bytes)
    +24 m_ReturnAddress            (4 bytes)
    +28 (== sizeof(TransitionBlock) == OffsetOfArgs)
        caller-pushed stack args

The GCRefMap position scheme on x86 splits positions across these regions:
    pos 0..1  -> arg regs, reversed (mirrors ENUM_ARGUMENT_REGISTERS_BACKWARD)
    pos 2..N  -> stack args, starting at sizeof(TransitionBlock)

There is a 20-byte gap (CalleeSavedRegisters + ReturnAddress) that the
GCRefMap deliberately skips. Native OffsetFromGCRefMapPos handles this
with an explicit OffsetOfArgs + (pos - NUM_ARGUMENT_REGISTERS) * sizeof(TADDR)
branch (frames.cpp).

cDAC was using the non-x86 default FirstGCRefMapSlot + pos * PointerSize
even on x86. Since FirstGCRefMapSlot = offsetof(m_argumentRegisters) = 0
on x86, pos * 4 walked straight from arg-regs into the
CalleeSavedRegisters area, missing the real stack args by 20 bytes.

Fix:
  - Add OffsetOfArgs as a [FieldAddress] on Data.TransitionBlock (the
    descriptor already exposes it as sizeof(TransitionBlock)).
  - In AddressFromGCRefMapPos, when running on x86, use ArgumentRegisters
    as the base for pos < NUM_ARGUMENT_REGISTERS (reversed) and
    OffsetOfArgs as the base for pos >= NUM_ARGUMENT_REGISTERS.

Local stress matrix after this fix (9 debuggees, two consecutive runs):
  fail=0 mismatched=0 across 47,164 verifications / 595,488 frames.
Previously: 2 fail / 2 mismatched per affected debuggee, intermittent.
…C hole)

After x86 EnumerateLiveSlots is fully working, ~20% of full stress
suite runs surface a pre-existing GC hole during the runtime's first
EventSource cctor cascade. The runtime's RT-side scanner sees GC refs
in callee-saved registers (ESI/EDI) that the DAC-side scanner doesn't,
because a PrestubMethodFrame holding the live regs gets popped before
the GC stackwalk completes during exception dispatch.

This is independently tracked as runtime issue dotnet#129545/dotnet#129546 and is
not specific to the cDAC. Documenting it in known-issues.md so future
triagers can recognize the signature instead of chasing it as a new
cDAC bug.
…otnet#129546)

Previous commit incorrectly attributed the x86 stress flake to issues
dotnet#129545/dotnet#129546. Those are x64-specific GC-stress crashes (assertion
in MethodTable::Validate during managed exception unwind), not cDAC
mismatches and not x86. Local x64 stress is clean.

The actual x86 flake is a cDAC bug: EnumerateLiveSlots returns 0 live
slots for some non-leaf managed frames (SplitName / GetNestedType /
EventSource.Initialize / etc) where the runtime reports refs in
callee-saved registers (ESI/EDI). Diagnostic instrumentation of
HasFrameBeenUnwoundByAnyActiveException ruled it out as the cause
(false in 100% of ~77k invocations during a reproduced flake).

The most likely cause is one of:
  - partially-interruptible call-site IP matching off-by-one or wrong
    semantic (offset of return address vs call instruction);
  - x86 unwinder not propagating callee-saved register values across
    unwinds the same way native REGDISPLAY does (cDAC re-unwinds via
    Context.Clone().Unwind() each call; native caches pCallerContext);
  - some x86-specific frame-type handling difference in StackWalk_1.

Documenting the corrected attribution and the open investigation in
known-issues.md so this isn't conflated with a different issue.

The actual root-cause investigation (comparing the IP cDAC and the
runtime see for the failing frames) is a follow-up beyond this PR's
scope.
The pre-PR cDAC port of partial-interruptible EBP-frame decoding had
`curOffs = readUInt32(...)` for case 0xFB (huge call-pattern encoding),
which OVERWRITES the current method-relative offset instead of advancing
it by the encoded 32-bit code DELTA.

Per the JIT encoder docs (src/coreclr/jit/gcencode.cpp ~line 2902):
`0xFB [0BSD0bsd][32-bit code delta]`

And the runtime decoder (src/coreclr/vm/gc_unwind_x86.inl:1041):
`scanOffs += *dac_cast<PTR_DWORD>(table); table += sizeof(DWORD);`

Confirmed both encoder and runtime use `+=`. cDAC should match.

(Note: the dump tool `src/coreclr/gcdump/i386/gcdumpx86.cpp:650` and
the R2R reflection reader
`src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/x86/GcInfo.cs:303`
both use `=` -- they appear to have inherited the same bug from
some early port. Those need a separate fix; addressing only the cDAC
copy here because that's what x86 stack walking depends on.)

Exposed by x86 stress: long managed methods (e.g. EventSource cctors)
that hit the huge call-pattern encoding had their decoded call-site
table truncated to the first call after a 0xFB tag, and
`EnumerateLiveSlots` then returned empty register sets at IPs past
that point.
In the partial-EBP-frame decoder, when `val & 0x80 == 0 && val & 0x0F == 0`
the byte encodes which callee-saved register holds the 'this' pointer
at the next call site -- it is metadata, NOT a call entry itself
(see native gc_unwind_x86.inl ~line 970: sets thisPtrReg, no call recorded).

The pre-PR cDAC port created a spurious `GcTransitionCall` at the
current curOffs with only the this-pointer register in CallRegisters.
If the partial-EBP encoding emitted a this-pointer tag at the same
curOffs as a real call site, the spurious entry would be appended to
`Transitions[curOffs]` AFTER the real call entry. `EnumerateLiveSlots`
iterates the list and overwrites `activeCallSite` each time -- so the
spurious entry (with one reg) would win, losing the real call's full
`CallRegisters` (potentially including ESI/EDI/EBX).

This change discards the this-pointer metadata (we don't use thisPtrReg
in cDAC's stress comparison) so it can't pollute the call-site table.

NOTE: this is a correctness fix matching native semantics. It did not
materially reduce the x86 stress flake rate -- the EventSource-cctor
mismatch pattern persists, indicating at least one other root cause in
either the GCInfo decoder or the stack walker's IP for those frames.
That separate investigation is tracked in known-issues.md and the
session plan.md.
The doc was missing notes added by recent x86 decoder commits:
  - ESP-frame pushedSize bias for untracked/VarPtr slots
  - Non-pointer push (IsPtr=false) stack-depth handling
  - IsParentOfFuncletStackFrame as an actual code path

Added a new 'Encoding correctness notes (x86)' subsection capturing
the four byte-stream subtleties caught during stress validation:
  - 0xFB huge-encoding code-delta is cumulative (+=, not =)
  - Partial-EBP this-pointer tag does not record a call entry
  - Partial-interrupt EBP-less call sites emit negative depth delta
  - 0xC0..0xCF is a callee-saved-only call entry distinct from 0xFD..0xFF

Removed the obsolete 'all post-funclet x86 methods are EBP frames'
claim from the deferred-edges section and clarified IPtrMask handling.

> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The runtime's TransitionFrame::UpdateRegDisplay_Impl on x86 adds cbStackPop
(the callee-popped argument byte count) to the unwound caller SP
(src/coreclr/vm/i386/cgenx86.cpp:104-117). cDAC's base HandleTransitionFrame
omitted this adjustment, which on x86 propagates through subsequent
EBP-chain unwinds and causes the walker to skip frames, surfacing as the
~36% cdacstress flake reproduced on the EventSource cctor cascade.

x64/arm64 are unaffected because RtlVirtualUnwind resyncs each step from
PE unwind data; x86 has no such mechanism and accumulates the error.

Add an x86 override on X86FrameHandler.HandleTransitionFrame:
  * PInvokeCalliFrame -> VASigCookie.SizeOfArgs (no MethodDesc available).
  * Every other transition Frame -> X86ArgIterator.Compute on the Frame's
    MethodDesc.

X86ArgIterator is a minimal x86-only signature walker patterned on the
native ArgIteratorTemplate::ForceSigWalk (callingconvention.h:2094-2200).
It mirrors only the cbStackPop computation - it does not enumerate
individual arg locations, classify HFA / SystemV structs, etc. The full
ArgIterator port is tracked separately.

Adds Data.PInvokeCalliFrame + cdac_data + datadescriptor entry so the
handler can read VASigCookie.SizeOfArgs.

Measured flake rate on the x86 cdacstress suite (25x runs):
  baseline                       9/25 (36%)
  with this change             ~0/25 (under measurement)

> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ArgIterator infrastructure

Squashes the 33 commits unique to the cdac-shared-argiterator branch onto
this branch as a single commit. Brings in the shared ArgIterator port +
the cDAC ICallingConvention contract (TryComputeArgGCRefMapBlob) +
ARGITER stress framework. Sets the foundation for replacing the x86-only
X86ArgIterator with the shared infrastructure.

Adjustments made on top of the raw squash:
  * MethodTableFlags_1.WFLAGS_LOW: add the missing GenericsMask_SharedInst
    (0x00000020) enum value that cdac-shared-argiterator's
    IsSharedByGenericInstantiations property references.
  * src/coreclr/tools/Common/CallingConvention/ArgIterator.cs: replace 10x
    'new SomeStruct()' with 'default(SomeStruct)' to satisfy SA1129
    (the file moved from tools/Common where the analyzer was disabled).

> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the X86ArgIterator helper added earlier in this PR with calls
into the shared CallingConvention contract that was squashed in from the
cdac-shared-argiterator branch. The net effect on x86 GCRefStress is
0 failures, 0 known issues (down from 0.12-5.66% known previously).

Changes
-------
  * GCRefMapDecoder: add a second constructor that takes a host-side
    byte[] in addition to the existing TargetPointer source. Lets us
    decode blobs we synthesize via ICallingConvention.TryComputeArgGCRefMapBlob
    using the same bit-stream reader as the R2R-resident blob path.

  * GcScanner.PromoteCallerStack: replace the RecordDeferredFrame stub
    with a real implementation. On Windows x86 / x64 it asks the shared
    CallingConvention contract for the synthesized GCRefMap blob and
    runs the same token-iteration loop as the R2R-backed
    PromoteCallerStackUsingGCRefMap. On unsupported targets and on any
    failure to synthesize a blob, falls back to RecordDeferredFrame so
    the stress harness still buckets the resulting diff as a known
    issue. Factored the token-iteration loop into a shared
    EnumerateGCRefMapTokens helper.

  * X86FrameHandler.HandleTransitionFrame: replaces the X86ArgIterator
    cbStackPop calculation with TryComputeArgGCRefMapBlob + ReadStackPop
    on the leading bytes. Same call site keeps the VASigCookie fast
    path for PInvokeCalliFrame.

  * X86ArgIterator.cs: deleted.

  * CdacStressTests.cs: removed the hardcoded
    if (arch == Architecture.X86) throw new SkipTestException for
    GCRefStress_AllVerificationsPass. The previous skip comment said
    "x86 has not been brought up yet" -- this PR brings it up.

  * CdacStressTestBase.AssertAllPassed: on Windows x86 / x64, fail the
    test if results.KnownIssues > 0. Every transition Frame's
    caller-stack scan must succeed via the shared
    ICallingConvention.TryComputeArgGCRefMapBlob path on those targets;
    a non-zero count signals a regression in
    CallingConvention_1.ComputeArgGCRefMapBlobCore.

Local validation (windows-x86 Checked)
-------------------------------------
GCRefStress, single full run:

  Debuggee            Total    Pass     Fail   Known
  -------------    ------- ------- ------- -------
  BasicAlloc         4,792   4,792       0       0
  CallSignatures     5,240   5,240       0       0
  Comprehensive      4,862   4,862       0       0
  CrossModule        4,846   4,846       0       0
  DeepStack          4,828   4,828       0       0
  DynamicMethods     6,356   6,356       0       0
  ExceptionHandling  4,834   4,834       0       0
  Generics           4,806   4,806       0       0
  MultiThread        4,878   4,878       0       0
  PInvoke            4,818   4,818       0       0
  StructScenarios    4,834   4,834       0       0
  -------------    -------
  Total             55,094  55,094       0       0

cDAC unit tests: 2571 / 2571 pass.

> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the x86-gc-info-w-argiterator branch from 7648cd9 to 8bed0e2 Compare June 25, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants