@deepak-singhal0408 @prabhataravind @prsunny for viz
This problem is caused by the changes in PR #4400
This change is exposing a race condition that can lead to memory corruption. We noticed Orchagent's ZMQ thread crash when METADATA configuration is changed and Orchagent is sent SIGTERM.
Before this change, orchagent terminated via _exit() on SIGTERM and ~OrchDaemon never ran.
This is where main thread is:
main() main.cpp:1047
└── shared_ptr<OrchDaemon>::~shared_ptr shared_ptr.h:175
└── ~OrchDaemon orchdaemon.cpp:139
└── ~PortsOrch portsorch.h:151
└── ~FlexCounterTaggedCachedManager / ~FlexCounterCachedManager
└── ~FlexCounterManager flex_counter_manager.cpp:138
└── stopFlexCounterPolling saihelper.cpp:1106
└── sai_switch_api->set_switch_attribute
└── sairedis::Sai/Server/ClientServerSai::set
└── RedisRemoteSaiInterface::set / notifyCounterOperations
└── RedisRemoteSaiInterface::waitForResponse RedisRemoteSaiInterface.cpp:935
└── ZeroMQChannel::wait ZeroMQChannel.cpp:279
└── zmq_poll ← BLOCKED HERE
- ~FlexCounterManager makes blocking SAI calls during destruction, round-tripping through sairedis's ZMQ channel to syncd. Main thread parks in zmq_poll.
- Meanwhile, multiple ZMQ I/O threads remain alive for various contexts (sairedis channel, north-side ZmqServer, and any ZmqProducerStateTables owned by orchs that were already deleted earlier in the reverse-order loop).
- Orchs deleted earlier likely freed memory still referenced by in-flight ZMQ messages — those zmq_ctx_term calls never happen because owners go out of scope without explicit termination. The crashing thread then dereferences a freed message buffer.
- The new async swss recorder thread (SwSSRec::setAsync(true), started lazily via ensureAsyncWorkerLocked()) adds another draining background thread sharing state with destructed orchs — second-order risk.
The graceful-shutdown signal handlers were added for one narrow purpose: to drain the new async swss.rec worker's queue on SIGTERM/SIGINT so pending records aren't lost on shutdown. Without it, _exit() on SIGTERM would discard whatever the recorder worker hadn't yet flushed to disk.
The problem with the implementation: rather than signaling just the recorder worker to drain and then calling _exit(), the PR set gOrchShutdownRequested, which lets start() return and triggers the entire ~OrchDaemon destructor chain. That's far broader than needed for the goal — and it's the chain that exposes the latent shutdown-ordering bugs (FlexCounterManager doing SAI calls during destruction, ZMQ I/O threads still alive while orchs are torn down) that crashed orchagent.
A narrower fix would have been: signal handler → drain recorder queue → _exit(). No full destructor chain needed.
Signal handlers are always fraught and we should really quantify what we get out of this.
@deepak-singhal0408 @prabhataravind @prsunny for viz
This problem is caused by the changes in PR #4400
This change is exposing a race condition that can lead to memory corruption. We noticed Orchagent's ZMQ thread crash when METADATA configuration is changed and Orchagent is sent SIGTERM.
Before this change, orchagent terminated via _exit() on SIGTERM and ~OrchDaemon never ran.
This is where main thread is:
The graceful-shutdown signal handlers were added for one narrow purpose: to drain the new async swss.rec worker's queue on SIGTERM/SIGINT so pending records aren't lost on shutdown. Without it, _exit() on SIGTERM would discard whatever the recorder worker hadn't yet flushed to disk.
The problem with the implementation: rather than signaling just the recorder worker to drain and then calling _exit(), the PR set gOrchShutdownRequested, which lets start() return and triggers the entire ~OrchDaemon destructor chain. That's far broader than needed for the goal — and it's the chain that exposes the latent shutdown-ordering bugs (FlexCounterManager doing SAI calls during destruction, ZMQ I/O threads still alive while orchs are torn down) that crashed orchagent.
A narrower fix would have been: signal handler → drain recorder queue → _exit(). No full destructor chain needed.
Signal handlers are always fraught and we should really quantify what we get out of this.