Skip to content

[Diagnostics] In-Proc CrashReport lifecycle#128738

Merged
mdh1418 merged 11 commits into
dotnet:mainfrom
mdh1418:inproc_crashreport_lifecycle_impl
Jun 4, 2026
Merged

[Diagnostics] In-Proc CrashReport lifecycle#128738
mdh1418 merged 11 commits into
dotnet:mainfrom
mdh1418:inproc_crashreport_lifecycle_impl

Conversation

@mdh1418

@mdh1418 mdh1418 commented May 29, 2026

Copy link
Copy Markdown
Member

On platforms where createdump can't fork a helper process (notably Android and Apple mobile under CoreCLR), crash reports are written in-proc from the faulting thread. Today those reports are written to a single configured path with no management: nothing controls where they accumulate or how many are kept, so they pile up unbounded in app-private storage.

This feature gives the in-proc crash reporter a durable, bounded, app-local directory for its *.crashreport.json output. The design borrows the proven mechanics that the mobile platforms already use for their own crash artifacts:

Android (tombstones). The OS keeps a bounded rotating set of tombstone files. tombstoned writes completed dumps to /data/tombstones/tombstone_00, tombstone_01, …, with a default cap of 32 (tombstoned.max_tombstone_count). Dumps are first written to anonymous temp files and only linked into their final tombstone_NN path once the dumper reports completion, so a failed or timed-out dump never leaves a named partial tombstone.

iOS / Apple. Crash reports are OS-managed diagnostic logs (.ips/.crash), generated after the app is allowed to finish crashing. They live in the device's Analytics Data and are surfaced via Xcode/Console, TestFlight, and the App Store. Retention and pruning are entirely Apple/OS-managed — there is no app-controlled rotation contract.

This PR introduces an opt-in crash-report lifecycle that owns the report directory layout and applies a retention policy, all using async-signal-safe primitives on the crash path.

  • Directory layout - reports are written under a managed tree rooted at CrashReportRootPath: <root>/.dotnet/crash-reports/report-<timestampNs>-<pid>.crashreport.json
  • Retention policy (DOTNET_CrashReportMaxFileCount):
  • DOTNET_CrashReportMaxFileCount (positive integer, default 32): prune oldest reports (by the timestamp encoded in the filename) so at most N are retained, reserving a slot for the new report. Invalid/out-of-range values fall back to the defaultreserving a slot for the new report.
  • Atomic, never-overwrite publish - a completed report is moved from a temp file to its final name with a same-directory rename (async-signal-safe per signal-safety(7)), guarded so an existing report is never clobbered.
  • Init-time capability probe - at startup the lifecycle verifies the directory permits create + rename + delete; if a sandbox rejects the publish primitive, file output is disabled up front with a clear diagnostic instead of silently losing reports at crash time.

Configuration

  • DOTNET_EnableCrashReport - opt in (existing).
  • DOTNET_CrashReportRootPath - root directory for the managed report tree (new).
  • DOTNET_CrashReportMaxFileCount - positive retention bound (default 32).

Testing

Validated end-to-end on an Android x64 emulator (CoreCLR): the bounded retention policy holds (oldest pruned at the cap), reports are valid JSON, the never-overwrite invariant holds, and the init probe leaves no stray artifacts. Design notes are in inproccrashreportlifecycle.design.md.

mdh1418 and others added 5 commits May 28, 2026 13:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce InProcCrashReportLifecycle to own the on-disk lifecycle of
in-proc crash reports: choosing the output directory, validating its
writability, generating report names, and enforcing the retention cap so
the crash path only has to emit a report into a prepared location.

The crash-time entry points (preparing the report file, building paths)
remain allocation-free and signal-safe; the initialization-time work
(directory probing, pruning) runs once during startup before the crash
handler is armed and may use the heap and non-async-signal-safe calls.

Shared string copy/append helpers used by both the reporter and the
lifecycle are factored into CrashReportStringUtils
(crashreportstringutils.{h,cpp}) so the two callers no longer carry
duplicate implementations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add design notes co-located with the in-proc crash report lifecycle source, capturing the Android/iOS crash-artifact investigation, the configuration and on-disk layout, the init- vs crash/signal-path split, the retention model, and the key design decisions and review findings that shaped the implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FinishReportFile published the completed report via link() followed by an unconditional unlink() of the temp file, discarding the link() return value. Android and Apple mobile app-private storage reject hard links with EPERM, so every crash report was silently lost on the exact platforms this in-proc feature targets.

Replace the link+unlink commit with a same-directory rename, which is POSIX async-signal-safe and permitted in those sandboxes. A best-effort access(final, F_OK) check before the rename preserves the exclusive-create intent that link's EEXIST previously provided; the final name is collision-resistant by construction, so the residual TOCTOU window is benign. On any failure the temp file is still removed.

Update the header contract and design doc to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ProbeDirectoryWritable now also exercises a same-directory rename (the primitive FinishReportFile uses to publish a completed report), not just create/delete. If a sandbox permits create/delete but rejects rename, file output is disabled at initialization with a clear diagnostic instead of silently losing every report on the signal path at crash time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds lifecycle-managed storage for CoreCLR in-proc crash reports, including a managed directory layout, retention modes, and safer temp-to-final publishing for platforms where reports are written from the faulting process.

Changes:

  • Adds crash report lifecycle management with root path configuration, retention parsing, directory setup, pruning, and temp-file publishing.
  • Wires lifecycle-managed output into the existing in-proc crash reporter while preserving legacy DbgMiniDumpName behavior.
  • Adds shared crash-report string helpers, CMake integration, config entries, and design documentation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/coreclr/vm/crashreportstackwalker.cpp Reads new crash-report lifecycle configuration and passes it into reporter settings.
src/coreclr/inc/clrconfigvalues.h Adds CrashReportRootPath and CrashReportMaxFileCount configuration entries.
src/coreclr/debug/crashreport/inproccrashreportlifecycle.h Declares the lifecycle manager for report directory setup, retention, and crash-path publishing.
src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Implements lifecycle initialization, report scanning, pruning, path construction, and final publishing.
src/coreclr/debug/crashreport/inproccrashreportlifecycle.design.md Documents the design rationale and retention model.
src/coreclr/debug/crashreport/inproccrashreporter.h Extends reporter settings with lifecycle root and retention options.
src/coreclr/debug/crashreport/inproccrashreporter.cpp Integrates lifecycle-managed file output into report creation and finalization.
src/coreclr/debug/crashreport/crashreportstringutils.h Adds shared signal-safe string utility declarations.
src/coreclr/debug/crashreport/crashreportstringutils.cpp Implements shared bounded string copy/append helpers.
src/coreclr/debug/crashreport/CMakeLists.txt Includes the new lifecycle and string utility sources in the crashreport object library.

Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
mdh1418 and others added 3 commits June 2, 2026 14:53
Consolidates the follow-up changes addressing maintainer review of the
in-proc crash report lifecycle into a single net change over the prior
pushed state, so there are no intermediate commits that add behavior a
later commit removes.

Net behavior of the lifecycle and its reporter integration:

- Publish completed reports with a same-directory rename guarded by an
  ENOENT existence check, instead of link(); rename is async-signal-safe
  and is permitted in the mobile app-private storage sandboxes where hard
  links are rejected with EPERM.
- Probe the report directory for create/rename/delete at init using local
  stack buffers, disabling file output up front on an unusable directory.
- Reuse CrashReportStringUtils::CopyString rather than a second local
  string-copy helper.
- Require an absolute CrashReportRootPath; the runtime no longer expands a
  leading '~' or $HOME. The integration layer supplies a concrete path.
- Drop the per-app process-name path segment: on mobile the root is already
  app-private, so reports live directly under <root>/.dotnet/crash-reports/.
- Name reports with a nanosecond CLOCK_REALTIME timestamp and drop the
  suffix-retry loop; nanosecond resolution makes collisions vanishingly
  unlikely, so no disambiguating suffix is needed.
- Replace the retention modes with a single bounded model: prune to
  maxFileCount at init (caching the oldest report only when at the bound)
  and unlink that cached oldest on the crash path before publishing a new
  report, so the bound holds whether or not a crash follows.
- Remove the process-liveness probe for stale temp cleanup; a leftover
  .tmp can only come from a defunct run of the same app, so it is removed
  unconditionally.
- Make the lifecycle-managed root the only JSON sink: drop DbgMiniDumpName
  (and its template expansion and cached hostname) from the in-proc
  reporter. With no root configured the reporter runs compact-log-only.
- Clamp configured maxFileCount to [1, INT32_MAX], falling back to the
  default for out-of-range or unparseable values.
- Update the design notes to match the implemented model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integrate the in-proc crash report watchdog feature from upstream with the
crash report lifecycle work. Conflict resolution:
- InProcCrashReporterSettings keeps lifecycleRootPath (replacing the removed
  reportPath/DbgMiniDumpName mechanism) and adds the watchdog timeoutSeconds.
- CreateReport keeps the watchdog CrashReportWatchdogScope.
- CrashReportConfigure wires both the lifecycle (lifecycleRootPath/maxFileCount)
  and the watchdog (timeoutSeconds); the old reportPath/dumpName assignment is
  dropped.
- GetCrashReportMaxFileCount now shares the watchdog's
  TryParseCrashReportConfigurationInteger bounded parser (strict, [1, INT32_MAX])
  instead of a local strtol, so both crash-report config integers validate the
  same way.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename settings field lifecycleRootPath -> reportRootPath for clarity and
  consistency with the CrashReportRootPath configuration knob.
- Inline the single-use DeleteCachedReport helper into the crash path.
- Document why retention pruning uses a linear oldest-scan rather than a min-heap.
- Document why TryParseUnsigned is used instead of strtoul/strtoull.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 21:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Comment thread src/coreclr/vm/crashreportstackwalker.cpp Outdated
Comment thread src/coreclr/vm/crashreportstackwalker.cpp
Comment thread src/coreclr/vm/crashreportstackwalker.cpp
Comment thread src/coreclr/inc/clrconfigvalues.h Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp
Comment thread src/coreclr/debug/crashreport/inproccrashreporter.cpp
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.design.md Outdated
- Remove duplicate <errno.h> (crashreportstackwalker.cpp) and <stdlib.h>
  (inproccrashreporter.cpp) includes.
- Correct the CrashReportMaxFileCount config description to a positive
  integer bound; the -1/0 modes were dropped during the earlier
  simplification.
- Inline strtoull/strtoul in TryParseReportName and remove the
  TryParseUnsigned helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp Outdated
Address reviewer feedback on the in-proc crash report lifecycle init path:

- Value-initialize the kept-report array so any never-populated slots stay
  well-defined (hygiene; the slots were never read uninitialized).
- Populate each kept slot directly from the local full path instead of
  copying into a temporary FileInfo and then assigning the whole nested
  struct, removing a redundant ~1KB path copy per retained report. The
  full-set overflow comparison is inlined (timestamp, then path) to avoid
  materializing a candidate FileInfo.
- Use O_TRUNC instead of O_CREAT|O_EXCL for the throwaway temp report file
  and the directory-writability probe. Neither needs the never-overwrite
  guarantee, so O_TRUNC tolerates a stale leftover while still failing on a
  genuinely unwritable directory. This also lets the probe drop its
  preceding unlink of a stale probe file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 17:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/inc/clrconfigvalues.h Outdated
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp
Comment thread src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp
Comment thread src/coreclr/debug/crashreport/crashreportstringutils.h Outdated

@lateralusX lateralusX left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mdh1418 mdh1418 merged commit 4f916b0 into dotnet:main Jun 4, 2026
114 checks passed
@mdh1418 mdh1418 deleted the inproc_crashreport_lifecycle_impl branch June 4, 2026 15:10
@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants