Skip to content

Compute all checksums before emitting into image stream#129016

Open
jtschuster wants to merge 3 commits into
dotnet:mainfrom
jtschuster:jtschuster/cautious-fortnight
Open

Compute all checksums before emitting into image stream#129016
jtschuster wants to merge 3 commits into
dotnet:mainfrom
jtschuster:jtschuster/cautious-fortnight

Conversation

@jtschuster
Copy link
Copy Markdown
Member

@jtschuster jtschuster commented Jun 4, 2026

Instead of creating a copy of the entire image in-memory, use the existing stream to compute the checksum before emitting the checksum relocs into the final image. This avoids a large allocation and should prevent some of the OOMs we're seeing in 32-bit CI.

Note

This pull request was authored with assistance from GitHub Copilot.

crossgen2 and ilc enable Server GC unconditionally. On a 32-bit host
process the ~2 GB user-mode address space is largely consumed up front
by Server GC's per-heap segment reservations (observed ~1.5 GB / 75%
reserved across 4 heaps with only ~296 MB committed). Compiling a very
large method then fails to allocate the contiguous output-image buffer
(MemoryStream.ToArray in EmitChecksums, ~10 MB for HugeArray1) and the
process fail-fasts with OutOfMemory, even though real memory use is low.

Use Workstation GC when the compiler process architecture is 32-bit
(x86/arm/armel). The process architecture is CrossHostArch when
cross-building, otherwise TargetArchitecture, mirroring the existing
TargetArchitectureForSharedLibraries logic. Cross-targeted product
builds run crossgen2 as a 64-bit host and keep Server GC, so only the
genuinely 32-bit tool process is affected.

Fixes dotnet#128531

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jtschuster
Copy link
Copy Markdown
Member Author

/azp run runtime-coreclr outerloop

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @anicka-net, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the CoreCLR AOT tool build settings so crossgen2/ilc default to Workstation GC when the compiler process is 32-bit, avoiding large Server-GC virtual address-space reservations that can lead to OOMs in constrained 32-bit address spaces.

Changes:

  • Infer the AOT compiler host process architecture from CrossHostArch (when set) or TargetArchitecture.
  • Set ServerGarbageCollection only when not explicitly provided, defaulting to true on non-x86/arm/armel hosts and false otherwise.
  • Add inline rationale documenting why Server GC is disabled for 32-bit hosts.

@MichalStrehovsky
Copy link
Copy Markdown
Member

MemoryStream.ToArray in EmitChecksums, ~10 MB for HugeArray1.dll

This is the first time I'm seeing EmitChecksums. We should fix EmitChecksums instead. The array this is allocating is the size of the output. It is not unheard of to have a 10s or 100s of MB R2R image (it doesn't even have to be code, this could be embedded resources). For native AOT this could be 1+ GB since it includes debug info. At this point of the compilation, we're already holding all of the outputs in memory, also the serialized copy in the stream, and this is making another copy. For a big app it's likely we don't have enough memory to make this copy, irrespective of server GC or not.

The way EmitChecksums is implemented now, we need the copy because each checksum writer writes into the output but the subsequent checksums want to see the original bytes. We should instead stage this so that each checksum provider can compute the checksum using the unmodified stream (original stream, not array) and at the end we write out the values.

jtschuster and others added 2 commits June 5, 2026 14:05
EmitChecksums copied the entire output image into a MemoryStream and then
into a contiguous byte[] (MemoryStream.ToArray) so that each checksum and
the deterministic PE timestamp could be computed over the original,
unmodified bytes. That copy is the size of the output image - 10s to 100s
of MB for large R2R images, and 1+ GB for native AOT with debug info. On a
32-bit compiler host the ~2 GB user-mode address space is easily too
fragmented to satisfy that single contiguous allocation, so compiling a
very large method fail-fasts with OutOfMemory even though real memory use
is low.

Compute each checksum directly from the seekable output stream instead of
from a full in-memory copy, and defer writing the checksum/timestamp values
until all of them have been computed. Deferring the writes preserves the
previous behavior of hashing the image before any checksum is written, so
the output is byte-for-byte identical (verified by crossgen'ing a framework
assembly before and after the change).

IChecksumNode.EmitChecksum now takes the output Stream rather than a
ReadOnlySpan<byte> over the whole image.

Fixes dotnet#128531

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jtschuster
Copy link
Copy Markdown
Member Author

Modified EmitChecksums to calculate each checksum from the image before finally emitting the checksums into the image. This should be byte-for-byte identical to previous behavior.

@jtschuster jtschuster changed the title Disable Server GC for 32-bit crossgen2/ilc hosts Compute all checksums before emitting into image stream Jun 5, 2026
Copy link
Copy Markdown
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants