Compute all checksums before emitting into image stream#129016
Compute all checksums before emitting into image stream#129016jtschuster wants to merge 3 commits into
Conversation
crossgen2 and ilc enable Server GC unconditionally. On a 32-bit host process the ~2 GB user-mode address space is largely consumed up front by Server GC's per-heap segment reservations (observed ~1.5 GB / 75% reserved across 4 heaps with only ~296 MB committed). Compiling a very large method then fails to allocate the contiguous output-image buffer (MemoryStream.ToArray in EmitChecksums, ~10 MB for HugeArray1) and the process fail-fasts with OutOfMemory, even though real memory use is low. Use Workstation GC when the compiler process architecture is 32-bit (x86/arm/armel). The process architecture is CrossHostArch when cross-building, otherwise TargetArchitecture, mirroring the existing TargetArchitectureForSharedLibraries logic. Cross-targeted product builds run crossgen2 as a 64-bit host and keep Server GC, so only the genuinely 32-bit tool process is affected. Fixes dotnet#128531 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run runtime-coreclr outerloop |
|
Tagging subscribers to this area: @anicka-net, @dotnet/gc |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Updates the CoreCLR AOT tool build settings so crossgen2/ilc default to Workstation GC when the compiler process is 32-bit, avoiding large Server-GC virtual address-space reservations that can lead to OOMs in constrained 32-bit address spaces.
Changes:
- Infer the AOT compiler host process architecture from
CrossHostArch(when set) orTargetArchitecture. - Set
ServerGarbageCollectiononly when not explicitly provided, defaulting totrueon non-x86/arm/armelhosts andfalseotherwise. - Add inline rationale documenting why Server GC is disabled for 32-bit hosts.
This is the first time I'm seeing The way EmitChecksums is implemented now, we need the copy because each checksum writer writes into the output but the subsequent checksums want to see the original bytes. We should instead stage this so that each checksum provider can compute the checksum using the unmodified stream (original stream, not array) and at the end we write out the values. |
EmitChecksums copied the entire output image into a MemoryStream and then into a contiguous byte[] (MemoryStream.ToArray) so that each checksum and the deterministic PE timestamp could be computed over the original, unmodified bytes. That copy is the size of the output image - 10s to 100s of MB for large R2R images, and 1+ GB for native AOT with debug info. On a 32-bit compiler host the ~2 GB user-mode address space is easily too fragmented to satisfy that single contiguous allocation, so compiling a very large method fail-fasts with OutOfMemory even though real memory use is low. Compute each checksum directly from the seekable output stream instead of from a full in-memory copy, and defer writing the checksum/timestamp values until all of them have been computed. Deferring the writes preserves the previous behavior of hashing the image before any checksum is written, so the output is byte-for-byte identical (verified by crossgen'ing a framework assembly before and after the change). IChecksumNode.EmitChecksum now takes the output Stream rather than a ReadOnlySpan<byte> over the whole image. Fixes dotnet#128531 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit f54edf3.
|
Modified EmitChecksums to calculate each checksum from the image before finally emitting the checksums into the image. This should be byte-for-byte identical to previous behavior. |
MichalStrehovsky
left a comment
There was a problem hiding this comment.
Looks great, thank you!
Instead of creating a copy of the entire image in-memory, use the existing stream to compute the checksum before emitting the checksum relocs into the final image. This avoids a large allocation and should prevent some of the OOMs we're seeing in 32-bit CI.
Note
This pull request was authored with assistance from GitHub Copilot.