Skip to content

Wire Celeritas offload initialization and finalization automatically from Geant4 state transitions#2381

Open
rahmans1 wants to merge 47 commits into
celeritas-project:developfrom
rahmans1:g4-state-change-production
Open

Wire Celeritas offload initialization and finalization automatically from Geant4 state transitions#2381
rahmans1 wants to merge 47 commits into
celeritas-project:developfrom
rahmans1:g4-state-change-production

Conversation

@rahmans1

@rahmans1 rahmans1 commented May 1, 2026

Copy link
Copy Markdown
Contributor

Title

Wire Celeritas offload initialization and finalization automatically from Geant4 state transitions for TMI path

Summary

This PR wires Celeritas tracking-manager integration into Geant4 state transitions so Celeritas setup and
teardown can be driven automatically by Geant4 lifecycle changes. This removes the need for the DD4hep
CelerRun action and lets tracking-manager integration initialize shared and thread-local offload state
from G4VStateDependent callbacks.

Changes

  • Add Geant4 state-change callback infrastructure and route begin/end run transitions through
    IntegrationSingleton.
  • Register master and worker StateDependent hooks for tracking-manager integration.
  • Own the master state hook in IntegrationSingleton as a std::unique_ptr, alongside
    SetupOptionsMessenger; keep worker hooks thread-local.
  • Pass StreamId through state-change callbacks so lifecycle logic does not infer stream identity from
    Geant4 thread queries.
  • Document each on_state_change guard for repeated MT setup transitions, multiple BeamOn calls, missing
    options, tasking stream reuse, and workers with no local offload state.
  • Remove obsolete DD4hep CelerRun action code and steering-file usage.
  • Remove the old SharedParams local CoreState registration path; SharedParams now only tracks stream
    count.
  • Remove explicit GeantSd local processor cleanup by tracking local hit processors through weak_ptr and
    allowing recreation after ownership release.
  • Update integration and GeantSd tests for repeated runs, auto-hook behavior, state-change callbacks, and
    local processor recreation.

Assisted-by: Claude Code (claude-opus-4-7, claude-opus-4-6, claude-sonnet-4-6) and Codex (GPT-5.5)

Sakib Rahman added 2 commits May 1, 2026 14:14
Add hooks for driving Celeritas offload init/finalize from Geant4 state
transitions: a verify-callback setter, an on_state_change dispatcher, and an
auto_hooks_active flag. The dispatcher gates master init/finalize behind a
run-depth counter so that begin_run/end_run pairs fired per worker on the
master thread (in MT mode) only initialize once and finalize once.

Prompt: "Add infrastructure to IntegrationSingleton so Celeritas offload
init/finalize can be driven by Geant4 StateDependent callbacks. Provide a
verify-callback setter, an on_state_change dispatcher that calls
initialize_offload on begin_run and finalize_offload on end_run, and an
auto_hooks_active flag. Dedup begin_run/end_run on the master thread (which
sees one pair per worker in MT mode) using a depth counter so init runs once
and finalize runs once."

Assisted-by: Claude Code (claude-opus-4-7)
Register a master-thread StateDependent in IntegrationSingleton::setup_options
(also setting auto_hooks_active_) and a thread-local StateDependent in
TrackingManagerConstructor::ConstructProcess so begin_run and end_run drive
offload init/finalize. Install verify_local_setup as the verify callback from
TrackingManagerIntegration's constructor. Skip
IntegrationBase::BeginOf/EndOfRunAction when auto hooks are active so
init/finalize is not driven twice.

Prompt: "Use the IntegrationSingleton state-change hooks to drive offload
init and finalize automatically. Register a master-thread StateDependent in
setup_options (and set auto_hooks_active) and a thread-local StateDependent
in TrackingManagerConstructor::ConstructProcess. Install verify_local_setup
as the verify callback from TrackingManagerIntegration's constructor.
Short-circuit IntegrationBase::BeginOf/EndOfRunAction when auto hooks are
active."

Assisted-by: Claude Code (claude-opus-4-7)
@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

Test summary

 5 701 files   9 289 suites   18m 52s ⏱️
 2 332 tests  2 289 ✅  43 💤 0 ❌
32 945 runs  32 774 ✅ 171 💤 0 ❌

Results for commit 0e2e8b1.

♻️ This comment has been updated with latest results.

Sakib Rahman added 4 commits May 1, 2026 17:20
Move the master-thread StateDependent registration and auto_hooks_active_
flag from IntegrationSingleton::setup_options() to TrackingManagerIntegration
constructor. This prevents FSI/UAI from hitting the early-return bug when
worker threads lack their own StateDependent monitors.

FSI/UAI behavior is unchanged: they continue to rely on manual
BeginOfRunAction/EndOfRunAction calls to drive initialization.

Add TMIAutoHooks test to verify auto hooks work without explicit action calls.

Prompt: "Move master-thread StateDependent registration from setup_options()
to TrackingManagerIntegration constructor to prevent FSI/UAI worker thread
initialization failures."

Assisted-by: Claude Code (claude-opus-4-7)
In Geant4 MT mode, G4StateManager is thread-local. The master-thread
StateDependent (registered in TrackingManagerIntegration ctor) only
fires for master-thread state transitions, so worker threads never
received begin_run and initialize_offload was never called for them,
causing "Celeritas was not initialized properly" errors on all :mt tests.

Register a per-worker StateDependent in ConstructProcess so each
worker's G4StateManager dispatches begin_run/end_run to on_state_change.
Guard end_program finalization to skip MT workers (already finalized at
end_run) to prevent a double-finalize SEGFAULT during teardown.

Replace the fragile master_run_depth_ counter with a params_-idempotency
guard: master begin_run is skipped when params_ is already set, and
master end_run is skipped in MT (deferred to end_program). Remove the
old per-ConstructProcess StateDependent registration (now only done for
workers); master registration remains in TrackingManagerIntegration ctor.

Comment out CelerRun in steeringFile.py; auto-hook registration now
handles run initialization without an explicit run action.

Prompt: "In Geant4 MT mode, the auto-hook StateDependent is only registered
on the master thread in TrackingManagerIntegration ctor, but G4StateManager
is thread-local, so worker threads never receive begin_run notifications.
initialize_offload is never called for workers, causing 'Celeritas was not
initialized properly (maybe BeginOfRunAction was not called?)' on all :mt
test variants. Fix by registering a per-worker StateDependent in
ConstructProcess. Guard end_program finalization to skip MT workers since
they already finalized at end_run, preventing a double-finalize SEGFAULT
during teardown. Replace master_run_depth_ depth counter with a
params_-idempotency guard for cleaner master begin_run handling."

Assisted-by: Claude Code (claude-sonnet-4-6)
Clear the registered Geant4 sensitive-detector hit processor and shared core
state slot before finalizing a local EM transporter. This lets worker threads
reinitialize after Geant4 emits begin_run/end_run pairs during MT run-manager
initialization, avoiding stale per-stream pointers and state ownership checks.

Gate the cleanup to LocalTransporter so pure optical offload paths do not call
EM-specific shared-parameter accessors. Add a focused GeantSd test verifying
that a local hit processor can be recreated for the same stream after cleanup.

Prompt: "Fix MT TrackingManagerIntegration failures caused by worker
begin_run/end_run pairs during Geant4 run-manager initialization leaving stale
per-stream GeantSd hit processors and SharedParams state slots; clear those
local resources during finalization so workers can reinitialize cleanly."

Assisted-by: Claude Code (claude-opus-4-6) and Codex (GPT-5)
Move StateDependent registration and auto_hooks_active_ guard ahead of
tracking-manager construction in ConstructProcess. Workers must join the
state-change callback chain before options are validated; delaying
registration caused extra "cannot access offload before options are set"
errors during worker construction rather than at begin_run.

Add MT-only early return in ConstructProcess when options absent so
workers skip transporter creation without spurious errors.

In on_state_change, skip the first missing-options begin_run that
G4MTRunManager::Initialize fires before BeamOn. Track with
skipped_missing_options_init_ so real BeamOn begin_run still surfaces
the expected missing-options diagnostic. Serial behavior unchanged.

Prompt: "Fix MT TrackingManagerIntegration no_set_options test: workers
must register StateDependent in ConstructProcess before options are
checked, and the spurious begin_run from G4MTRunManager::Initialize must
be skipped in MT mode so the real BeamOn still reports the expected
missing-options error."

Assisted-by: Claude Code (claude-opus-4-6) and Codex (GPT-5)
@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.31%. Comparing base (c5d2318) to head (0e2e8b1).

Files with missing lines Patch % Lines
src/accel/detail/IntegrationSingleton.cc 85.71% 7 Missing and 4 partials ⚠️
src/accel/TrackingManagerConstructor.cc 73.68% 2 Missing and 3 partials ⚠️
src/accel/SharedParams.cc 60.00% 1 Missing and 1 partial ⚠️
src/celeritas/ext/GeantSd.cc 96.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2381      +/-   ##
===========================================
- Coverage    87.33%   87.31%   -0.02%     
===========================================
  Files         1401     1401              
  Lines        44443    44587     +144     
  Branches     13495    13532      +37     
===========================================
+ Hits         38813    38932     +119     
- Misses        4408     4419      +11     
- Partials      1222     1236      +14     
Files with missing lines Coverage Δ
src/accel/IntegrationBase.cc 89.13% <100.00%> (+5.25%) ⬆️
src/accel/LocalTransporter.cc 71.09% <ø> (+0.33%) ⬆️
src/accel/SharedParams.hh 93.54% <100.00%> (+0.21%) ⬆️
src/accel/TrackingManagerIntegration.cc 62.85% <100.00%> (+4.79%) ⬆️
src/accel/detail/IntegrationSingleton.hh 75.00% <100.00%> (+8.33%) ⬆️
src/celeritas/ext/GeantSd.hh 100.00% <ø> (ø)
src/celeritas/g4/StateDependent.cc 92.15% <100.00%> (+0.85%) ⬆️
src/celeritas/g4/StateDependent.hh 100.00% <ø> (ø)
src/celeritas/ext/GeantSd.cc 87.09% <96.00%> (+1.38%) ⬆️
src/accel/SharedParams.cc 84.70% <60.00%> (-0.27%) ⬇️
... and 2 more

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sakib Rahman and others added 3 commits May 6, 2026 19:29
Add IntegrationBase::use_auto_hooks as an integration-specific policy
hook and default it to false so the common BeginOfRunAction and
EndOfRunAction implementations continue to drive manual integration
paths. Gate the auto_hooks_active early returns on that policy instead
of the global singleton state alone.

Override the policy in TrackingManagerIntegration so only the
tracking-manager path suppresses explicit run-action calls when the
StateDependent auto hooks have been registered.

Prompt: "Refine the Geant4 auto-hook changes so auto-hook-driven BeginOfRunAction and EndOfRunAction suppression is implemented through an IntegrationBase policy hook, defaulting to manual run-action behavior for UserActionIntegration, FastSimulationIntegration, and other non-tracking-manager paths, with TrackingManagerIntegration as the only integration that opts into suppression when StateDependent auto hooks are active."

Assisted-by: Codex (GPT-5.5)
Track whether TrackingManagerConstructor was created from TrackingManagerIntegration
with an internal provenance flag. Use that flag to keep the TMI-backed
constructor path tied to StateDependent auto hooks while leaving the lower-level
SharedParams/local-offload constructor path on its existing manual lifecycle.

Only the TMI-backed path now requires auto_hooks_active, registers per-worker
StateDependent callbacks, and returns early in MT mode when setup options are
missing. Add a G4ThreadLocal guard so each worker registers the TMI state hook
at most once.

Prompt: "Refine TrackingManagerConstructor so TrackingManagerIntegration-backed construction always uses the StateDependent auto-hook lifecycle, while the lower-level constructor that receives SharedParams and a local offload callback is not blocked by TMI auto-hook state; implement this with an internal constructor-provenance flag and guard per-worker auto-hook registration against duplicates."

Assisted-by: Codex (GPT-5.5)
@rahmans1 rahmans1 marked this pull request as ready for review May 7, 2026 00:03
@rahmans1 rahmans1 requested review from sethrj May 7, 2026 00:04
Sakib Rahman added 4 commits May 6, 2026 22:26
Add StateDependent::RegisterWithGeant to centralize the intentional
G4StateManager-managed registration of Geant4 state-change dependents. The
helper returns no ownership handle, documents why the object must not be kept
as Celeritas member data, and releases the constructed dependent after Geant4
registration rather than exposing ownership at call sites.

Update TrackingManagerIntegration and TrackingManagerConstructor to use the
helper instead of constructing StateDependent directly at their registration
sites.

Prompt: "Fix clang-tidy leak diagnostics for StateDependent auto-hook registration by centralizing Geant4-owned registration in a void-returning RegisterWithGeant helper that constructs the dependent with make_unique, releases it after registration, documents why Celeritas must not keep member ownership, and replaces direct StateDependent construction in TrackingManagerIntegration and TrackingManagerConstructor."

Assisted-by: Codex (GPT-5.5)
The intentional lifetime transfer to Geant4 in RegisterWithGeant triggers
clang-analyzer-cplusplus.NewDeleteLeaks. Add NOLINTNEXTLINE to suppress,
as the doxygen comment already documents why the object must not be kept
as Celeritas member data.

Prompt: "Add NOLINTNEXTLINE to suppress clang-analyzer in StateDependent registration"

Assisted-by: Claude Code (claude-haiku-4-5)
Replace make_unique(...).release() in StateDependent::RegisterWithGeant with
the direct raw allocation used for Geant4 registration. CI clang-tidy flags the
release form for both bugprone-unused-return-value and NewDeleteLeaks, while
the raw allocation needs only the existing targeted analyzer suppression.

Prompt: "Switch StateDependent::RegisterWithGeant from make_unique(...).release() to direct raw new because CI clang-tidy flags the release form as an unused return value and a potential leak."

Assisted-by: Codex (GPT-5.5)
Use a NOLINTBEGIN/NOLINTEND block around StateDependent::RegisterWithGeant
because clang-tidy reports the intentional Geant4-owned allocation at function
exit rather than on the raw allocation line. Keep the suppression scoped to the
registration helper.

Prompt: "Widen the clang-analyzer NewDeleteLeaks suppression around StateDependent::RegisterWithGeant because CI reports the intentional Geant4-owned allocation at the function closing brace rather than on the raw new line."

Assisted-by: Codex (GPT-5.5)
@rahmans1 rahmans1 force-pushed the g4-state-change-production branch from 21addb8 to 41cc930 Compare May 7, 2026 10:14
Sakib Rahman and others added 5 commits May 7, 2026 10:42
Replace the leak-based Geant state-dependent registration used by tracking
manager integration with same-thread unique_ptr ownership for the master and
worker hooks. On end_program, StateDependent deregisters from Geant4 before
invoking the callback, and the callback resets the owning pointer on the same
thread. Remove the RegisterWithGeant leak helper and clang-analyzer leak
suppression.

Update the StateDependent test to exercise callback-driven destruction during
end_program.

Prompt: "Convert StateDependent tracking-manager hooks from intentionally leaked allocations to same-thread unique_ptr ownership after noticing that Notify already deregisters from G4StateManager on G4State_Quit/G4State_Abort. Reset the master and worker hooks during end_program after deregistration, remove the RegisterWithGeant leak helper and clang-analyzer leak suppression, and add focused coverage for callback-driven cleanup."
Allow the tracking-manager state diagnostic to skip run and event assertions for
task-runner worker streams that initialize but do not process events. These
streams can record initialization state changes without ever seeing begin_run in
small task-mode test runs.

Prompt: "Relax the tracking-manager state-dependent diagnostic test for task-runner worker streams that initialize without processing events, while keeping run and event assertions unchanged for streams that enter a run or process events."

Assisted-by: Codex (GPT-5.5)

@sethrj sethrj left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments so far...

Comment thread example/ddceler/input/steeringFile.py Outdated
* to \c end_program so that the spurious end_run cannot tear down shared state
* prematurely. Workers and serial mode finalize normally at end_run.
*/
void IntegrationSingleton::on_state_change(GeantStateChange change)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new callback includes the stream ID without g4threading, it would be best to include that as part of the signature instead of calling the error-prone master/mt .

Comment on lines +363 to +366
// G4MTRunManager::Initialize emits a run-like state
// transition before BeamOn. Preserve the explicit
// missing-options error for the real run.
skipped_missing_options_init_ = true;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile; I don't know if we can guarantee this is going to be the same for future versions...

Comment on lines +355 to +371
if (!options_ && is_mt)
{
if (!is_master)
{
break;
}
if (!skipped_missing_options_init_)
{
// G4MTRunManager::Initialize emits a run-like state
// transition before BeamOn. Preserve the explicit
// missing-options error for the real run.
skipped_missing_options_init_ = true;
break;
}
}
if (is_master && params_)
break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain this logic...

case GeantStateChange::end_run:
if (is_master && is_mt)
break;
if (!params_)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstance is this true? it also circumvents other "finalize offload" logic.

Comment on lines +384 to +385
if (is_master && is_mt)
break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this needed?

hit_manager->clear_local_processor(stream_id);
}
params_.clear_state(stream_id.get());
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why this is needed; also I don't know if you saw but the set_state has a comment about how it's hacky and is meant to be removed; so this seems like a step in the wrong direction.

bool failed_setup_{false};
bool auto_hooks_active_{false};
bool skipped_missing_options_init_{false};
std::function<void()> verify_callback_;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try adding a "state change" member as a unique ptr as is done with the SetupOptionsMessenger, which has some of the same weird memory semantics?

rahmans1 and others added 3 commits May 9, 2026 10:03
Delete the DD4hep CelerRun action plugin now that Geant4 state hooks drive
Celeritas run initialization and finalization automatically. Remove the stale
steering-file reference and plugin build entry.

Prompt: "Remove the obsolete DD4hep CelerRun run-action integration: delete the stale steering-file CelerRun reference, remove the CelerRun plugin from the ddceler build, and delete the CelerRun source/header now that automatic Geant4 state hooks handle initialization and finalization."
Assisted-by: Codex (GPT-5)
Pass the StateDependent stream ID through automatic Geant4 state transitions
instead of recomputing master/worker state at each callback. Update the
runtime verification callback to receive the stream ID, remove the fragile
missing-options skip counter, centralize already-finalized handling in
finalize_offload, and document the MT master/shared finalization deferral.

Prompt: "Make the following changes to the state-hook lifecycle: include the StateDependent stream ID in state-change and verification callback signatures, simplify the fragile automatic begin/end run handling, centralize already-finalized handling, and explain the MT master/shared finalization deferral."
Assisted-by: Codex (GPT-5.5)
Sakib Rahman and others added 7 commits May 9, 2026 14:31
Remove the obsolete SharedParams local state registration path and keep only the
stream count needed by diagnostics and setup. LocalTransporter no longer
registers its CoreState with SharedParams, and local finalization no longer
clears shared state. Hit manager local processor cleanup remains in
IntegrationSingleton so Geant4-owned thread-local data is released before the
local transporter drops its processor reference.

Prompt: "Remove and replace the hacky SharedParams set_state/clear_state path:
remove unnecessary SharedParams local CoreState registration, preserve correct
stream-count handling, and keep only the thread-local Geant4 hit-processor
cleanup needed during local finalization."

Assisted-by: Codex (GPT-5.5)
Track GeantSd local hit processors only through weak pointers so a
LocalTransporter-owned processor can expire naturally when local offload is
finalized. This allows a reused stream ID to create a fresh processor without
requiring IntegrationSingleton to clear GeantSd state before dropping the local
transporter.

Prompt: "Avoid explicit local hit-processor cleanup in IntegrationSingleton by moving local hit-processor
lifetime handling into GeantSd weak ownership."

Assisted-by: Codex (GPT-5.5)
Clarify why each IntegrationSingleton::on_state_change guard is needed for
Geant4-driven initialization and finalization. The comments now explain
duplicate master setup, missing-options retries, worker callbacks before shared
params exist, tasking stream reuse, MT master end-run deferral, uninitialized
worker end-run callbacks, and the end-program finalization fallback.

Prompt: "Document the IntegrationSingleton::on_state_change guard conditions with
with concrete explanations for each branch."

Assisted-by: Codex (GPT-5.5)
Rename the GeantSd local processor test to describe processor recreation and
remove the explicit clear_local_processor call. The test now verifies that a
processor can be recreated after the previous LocalTransporter-owned processor
is released.

Prompt: "Remove the explicit clear_local_processor call in Geant SD test and describe local processor
recreation through natural ownership release. Rename the test accordingly."

Assisted-by: Codex (GPT-5.5)
Let IntegrationSingleton own the master-thread StateDependent hook alongside
the SetupOptionsMessenger and expose register_auto_hooks for tracking manager
integration. This keeps the Geant4 state-hook lifetime with the lifecycle owner
while TrackingManagerIntegration only requests automatic hook registration.

Clarify why auto_hooks_active suppresses user run-action begin/end calls and
why tracking-manager installation is gated on the master state hook.

Prompt: "Move master-thread StateDependent ownership into IntegrationSingleton,
add a register_auto_hooks helper for tracking manager integration, keep worker
StateDependent hooks thread-local, and document why auto_hooks_active suppresses
manual begin/end run handling and gates tracking-manager installation."

Assisted-by: Codex (GPT-5.5)

@sethrj sethrj left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Sakib, sorry this has taken so long to review: vibe coding needs extra scrutiny because the code "looks" well documented and reasonable but that can hide major issues. The changes here take something that's already complicated and make it much more so. Maybe we should get together tomorrow to talk. I hate to keep getting in the way of your progress toward EPIC integration… especially if the main software architecture blockers are not in mainstream use.

Comment thread src/celeritas/ext/GeantSd.cc Outdated
Comment on lines +270 to +274
CELER_EXPECT(sid < processor_weakptrs_.size());
auto sp = processor_weakptrs_[sid.get()].lock();
CELER_EXPECT(sp);

return *sp;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove the weak pointers or is that a side effect of the AI changes? The (older and suboptimal) way that the GeantSd is called means that every thread will try to lock the pointer at every step iteration, and locks can be quite expensive. (I expect this will hurt more for CPU than GPU since less work is being done per step iteration).

Comment thread src/celeritas/g4/StateDependent.cc Outdated
Comment on lines +132 to +143
if (change == GeantStateChange::end_program)
{
// Allow the callback to destroy this object by keeping the active
// callable alive outside the object it may own.
auto local_stream = local_stream_;
auto cb = std::move(cb_);
cb(local_stream, change);
}
else
{
this->cb_(local_stream_, change);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the callback destroy the state dependent? Is it even legal to be inside a member function while it's being destroyed? This seems dangerous…

// TrackingManagerIntegration can be driven by Geant4 StateDependent
// callbacks; avoid duplicating initialization from user run actions.
if (this->use_auto_hooks() && singleton.auto_hooks_active())
return;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print a warning that the user should remove the run action calls from their code if they're using the TMI? And can you update the examples to match? I think it'd be good to rely more on the automatic state dependent.

CELER_TRY_HANDLE(verify_callback_(stream_id),
ExceptionConverter{"celer.init.verify"});
}
break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems with ML in general, and here specifically for LLM/agentic tools, is a tendency toward model overfitting. There might be five tests that are supposed to be general, but the AI figures out that if it puts five "if" statements in the right place, it makes the tests pass; but the tests are no longer general.

I think that we should probably move this kind of logic to the StateDependent class: maybe that can be a follow-on PR since what you've got at least works (and for all the versions we've got in CI).

Comment thread src/accel/detail/IntegrationSingleton.cc Outdated
rahmans1 and others added 19 commits June 9, 2026 00:22
Move automatic offload lifecycle filtering out of IntegrationSingleton and into
StateDependent lifecycle mode. Normalize Geant4 state ordering before notifying
IntegrationSingleton so the singleton only dispatches Celeritas lifecycle
operations.

Prompt: "Refactor automatic Geant4 lifecycle handling so StateDependent owns the Geant4-specific ordering rules. Add a lifecycle dispatch mode that filters duplicate begin/end run transitions, suppresses manager-thread end-run callbacks, handles end-program dispatch consistently, and emits only normalized begin_run, end_run, and end_program events to IntegrationSingleton. Add or update tests for the normalized lifecycle sequence."

Assisted-by: Codex (GPT-5.5)
Remove Geant4 ordering guards from IntegrationSingleton::on_state_change now
that StateDependent lifecycle mode filters duplicate and manager-thread
transitions before dispatch.

Prompt: "Simplify IntegrationSingleton automatic state-change handling after moving Geant4 lifecycle ordering into StateDependent. Remove guards that interpret Geant4 run-manager behavior, keep on_state_change limited to begin-run initialization, end-run finalization, and end-program finalization, and preserve stream-aware verification callback dispatch."

Assisted-by: Codex (GPT-5.5)
Keep master and worker StateDependent hooks owned by unique_ptrs and stop
resetting lifecycle hook ownership from inside active callbacks.

Prompt: "Update automatic StateDependent hook ownership to avoid destroying the active state-dependent object from inside its callback. Keep the master hook owned by IntegrationSingleton and worker hooks owned by thread-local unique_ptrs, following the SetupOptionsMessenger-style ownership model. Register lifecycle-mode hooks without callback self-reset logic so Geant4 notifications remain callback-only and ownership stays explicit."

Assisted-by: Codex (GPT-5.5)
Print a one-time warning when TrackingManagerIntegration auto hooks are active
and user run actions call manual BeginOfRunAction or EndOfRunAction hooks.

Prompt: "Add user-facing diagnostics for applications that still call manual TrackingManagerIntegration BeginOfRunAction or EndOfRunAction methods while automatic Geant4 state hooks are active. Emit a one-time warning telling users to remove redundant manual Celeritas run-action calls, and update TrackingManagerIntegration API documentation to describe automatic lifecycle initialization/finalization instead of requiring user run-action hooks."

Assisted-by: Codex (GPT-5.5)
Cache local hit processor raw pointers while retaining setup-time weak pointer
ownership tracking. Clear the raw pointer when the owning shared pointer is
destroyed so processors can be recreated safely without locking a weak pointer
during each step.

Prompt: "Address the GeantSd review concern about expensive weak_ptr locking during step processing. Store a per-stream cache slot with a raw HitProcessor pointer for fast local lookup, use weak_ptr locking only during make_local_processor setup or recreation, clear the cached raw pointer from the shared_ptr deleter when the processor is destroyed, and add a focused test proving that duplicate setup reuses the live processor while releasing all owners allows safe recreation."

Assisted-by: Codex (GPT-5.5)
…ns1/celeritas into g4-state-change-production

Bring branch up-to-date with develop
…stroy the active StateDependent object during Notify. Keep terminal deregistration for raw observers, and add an explicit lifecycle role so global monitors emit terminal cleanup while local worker monitors finalize at end_run. Mark TrackingManagerConstructor worker lifecycle hooks as local.

Prompt: "Preserve raw end-program deregistration, remove support for destroying the active StateDependent from inside its callback, make global versus local lifecycle end-program ownership explicit, and mark worker lifecycle hooks as local."

Assisted-by: Codex (GPT-5.5)
… local lifecycle test executables. This avoids mixing raw and lifecycle observers in one Geant4 state manager while covering terminal deregistration and lifecycle role behavior.

Prompt: "Add focused StateDependent tests for raw mode, global lifecycle mode, and local lifecycle mode. Keep each mode in a separate Geant4 test executable so tests cover the new lifecycle role behavior without relying on unsupported simultaneous raw and lifecycle observers or Geant4 dependent iteration order."

Assisted-by: Codex (GPT-5.5)
…ns1/celeritas into g4-state-change-production

Update branch to latest upstream
…Singleton::on_state_change. Both normalized lifecycle events intentionally finalize the offload state, and sharing the case body avoids the

clang-tidy bugprone-branch-clone warning.

Prompt: "Fix the clang-tidy bugprone-branch-clone warning in IntegrationSingleton::on_state_change by combining the identical end_run and end_program finalization branches without changing behavior."

Assisted-by: Codex (GPT-5.5)
…verage into

the existing StateDependent test executable. Drive Geant4 state transitions
directly through G4StateManager so lifecycle filtering is covered without
multiple GeantSetup runs in one process or extra CUDA-linked Geant4 test
executables.

Prompt: "Consolidate StateDependent lifecycle coverage into the existing test executable. Cover raw mode, global lifecycle mode, and local lifecycle mode with direct Geant4 state transitions, and remove the separate lifecycle test executables that fail in this build configuration."

Assisted-by: Codex (GPT-5.5)
…gers.

Verify that automatic StateDependent hooks drive setup without manual run action
calls, accounting for Geant4's different serial and MT lifecycle ordering.

Prompt: "Add full Geant4 integration coverage for TrackingManagerIntegration automatic StateDependent hooks. Register the auto-hook test for both serial and MT run managers, leave manual BeginOfRunAction and EndOfRunAction empty, and verify serial setup at BeamOn plus MT global setup at Initialize and local worker setup at BeamOn."

Assisted-by: Codex (GPT-5.5)
Separate local and shared cleanup ownership for StateDependent auto hooks.
Keep shared Celeritas state alive across repeated BeamOn calls and finalize it
only at terminal end_program. Finalize thread-local offload state at end_run,
including terminal local teardown, and make local setup/finalize idempotent.

Deliberately abandon StateDependent ownership immediately after auto-hook
registration so StateDependent can deregister itself during Geant4 terminal
teardown without being destroyed after G4StateManager teardown.

Add coverage for local terminal cleanup and repeated BeamOn lifecycle behavior.

Prompt: "Make the TMI StateDependent auto-hook lifecycle robust by separating local and shared cleanup ownership, keeping shared state alive across repeated BeamOn calls, finalizing local state at run or terminal teardown, avoiding late StateDependent destruction after Geant4 teardown, and adding tests for repeated BeamOn and local terminal cleanup behavior."

Assited-by: Codex (GPT-5.5)
Keep TrackingManagerIntegration StateDependent auto hooks owned with unique_ptr,
matching the SetupOptionsMessenger ownership pattern.

Remove the intentional release of master and worker state dependents so lifecycle
hooks remain normally owned while StateDependent still deregisters itself during
terminal Geant4 teardown.

Prompt: "Make the current TMI StateDependent auto-hook ownership follow the SetupOptionsMessenger pattern by keeping master and worker hooks owned with unique_ptr, removing intentional release calls, and relying on StateDependent terminal deregistration for Geant4 teardown safety."

Assisted-by: Codex (GPT-5.5)
Rename the lifecycle dispatch helper to better describe its behavior, and
update StateDependent and IntegrationSingleton comments to match the current
raw observer, lifecycle filtering, repeated-run initialization, and terminal
teardown behavior.

Add agent guidance requiring live-code comment reviews after renames or
behavior changes.

Prompt: "Rename StateDependent::notify_lifecycle to a clearer dispatch-oriented name, update all declaration, definition, call-site, and nearby test/integration comments from the live code so they accurately describe raw observer deregistration, lifecycle filtering, repeated-run initialization, and terminal teardown behavior."

Assisted-by: Codex (GPT-5.5)
@rahmans1 rahmans1 requested a review from sethrj June 24, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants