feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243)#1244
feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243)#1244ruck314 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an opt-in nonBlocking=True execution mode for pyrogue Commands to support long-running operations over ZMQ without tying up clients, using a worker-thread + polling model, plus documentation, examples, and integration/regression tests.
Changes:
- Introduces
BaseCommand(nonBlocking=True)with per-call worker thread and auto-injected{name}Running/{name}Result/{name}ErrorsiblingLocalVariables for client polling. - Hooks shutdown/attachment behavior to support injected siblings and worker joining (
Device._rootAttachediteration snapshot;Device._stopjoins command workers). - Adds a cookbook example and a new Sphinx page describing long-running operation patterns, plus integration tests for
VirtualClientandSimpleClient.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/interfaces/test_zmq_client.py | Comment tweaks clarifying concurrent ZMQ REQ corruption repro/fix expectations. |
| tests/interfaces/test_interfaces_simple_client_nonblocking.py | New integration test validating SimpleClient.exec returns immediately and polling variables update. |
| tests/interfaces/test_interfaces_command_nonblocking.py | New end-to-end ZMQ tests for blocking baseline vs non-blocking behavior, re-entry policy, shutdown join, error path. |
| tests/core/test_core_command_nonblocking_construction.py | New core tests for construction, sibling injection, and collision handling. |
| python/pyrogue/interfaces/_Virtual.py | Docstring cross-references to new long-running operations page. |
| python/pyrogue/examples/_NonBlockingCommandExample.py | New runnable cookbook example demonstrating polling loop for a non-blocking command. |
| python/pyrogue/examples/init.py | Exports the new non-blocking command example. |
| python/pyrogue/_Device.py | Ensures safe _rootAttached iteration when nodes are injected; joins command workers during stop. |
| python/pyrogue/_Command.py | Implements nonBlocking worker-thread dispatch, status sibling injection, and stop/join support. |
| docs/src/pyrogue_tree/core/local_command.rst | Adds “What to explore next” link to long-running operations docs. |
| docs/src/pyrogue_tree/core/command.rst | Adds “What to explore next” link to long-running operations docs. |
| docs/src/pyrogue_tree/client_interfaces/virtual.rst | Adds note + related-topic link pointing to long-running operations guidance. |
| docs/src/pyrogue_tree/client_interfaces/simple.rst | Adds related-topic link pointing to long-running operations guidance. |
| docs/src/pyrogue_tree/builtin_devices/process.rst | Adds related-topic link to long-running operations comparison page. |
| docs/src/pyrogue_tree/builtin_devices/long_running_operations.rst | New documentation page comparing pr.Process vs Command(nonBlocking=True) with recipes. |
| docs/src/pyrogue_tree/builtin_devices/index.rst | Adds navigation entries pointing to the new long-running operations page. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- _Command.__call__ now uses live thread aliveness as the re-entry guard
instead of self._runEn, closing the race where _stopWorker cleared the
flag before joining and let a second worker spawn while the first was
still alive.
- Mirror the sync _doFunc path: cache the resolved argument into
self._default and queue an update before dispatching the worker, so
cmd.get() reflects the last argument for non-blocking calls.
- Device._stop joins non-blocking command workers BEFORE stopping
managed interfaces, mirroring Process._stop calling _stopProcess
before Device._stop. This keeps interfaces/memory alive for worker
functions that touch them during teardown.
- examples/_NonBlockingCommandExample.main now raises TimeoutError on
deadline expiry, matching its documented behavior.
- long_running_operations.rst overview no longer claims linkTimeout
closes the connection; explains linkTimeout's idle semantics and
the role of requestStallTimeout.
- long_running_operations.rst comparison row clarifies that
{name}Running toggles per call, {name}Error clears at dispatch, and
{name}Result is sticky on success.
- test_root_stop_joins_in_flight_worker drops the over-strict
post_threads <= pre_threads assertion that could capture unrelated
background threads; the .worker leak check remains.
Add an opt-in `nonBlocking=True` kwarg on `BaseCommand` so long-running
commands invoked over ZMQ (`SimpleClient.exec` / `VirtualClient`) no
longer tie the client up past `linkTimeout` / `requestStallTimeout`.
`pr.Process` and `Command(nonBlocking=True)` are now the two canonical
patterns for long-running work, documented side-by-side.
Core API (python/pyrogue/_Command.py, python/pyrogue/_Device.py)
- BaseCommand(nonBlocking=True) opt-in kwarg. Default behaviour
unchanged -- sync call() still returns the function's value.
- RemoteCommand explicitly forwards nonBlocking to BaseCommand.__init__
only; it is not in **kwargs and therefore cannot reach
RemoteVariable.__init__.
- Worker thread per call (daemon=False, name "{path}.worker").
Auto-injects three sibling LocalVariables on the parent Device:
{name}Running (bool), {name}Result (last return value, sticky on
success), {name}Error (last exception string, cleared at dispatch).
- Re-entry guard uses live thread aliveness (self._thread.is_alive()),
not the _runEn flag, closing the race where _stopWorker cleared the
flag before joining.
- Sync-path parity: when the callback accepts an arg and this is a
LocalCommand, the resolved argument is cached into self._default and
an update is queued before worker dispatch, so cmd.get() reflects
the last invocation.
- Sibling-injection collision guard: pre-checks parent._nodes for
{name}Running / {name}Result / {name}Error before any mutation;
raises pr.NodeError on conflict so a third-sibling collision never
leaves a half-injected state.
- Device._stop() joins non-blocking command workers BEFORE stopping
managed interfaces, mirroring Process._stop calling _stopProcess
first. Worker functions that touch interfaces or memory during
teardown stay safe.
- Device._rootAttached snapshots _nodes before iterating so the
sibling-variable injection done by child BaseCommand._rootAttached
cannot raise RuntimeError: dictionary changed size during iteration.
- Worker lifecycle (_runWorker, _stopWorker, _stop) mirrors pr.Process,
including the self-thread guard that prevents deadlock if the user
function triggers Root.stop() from within itself.
Cookbook example (python/pyrogue/examples/_NonBlockingCommandExample.py)
- ApplyConfig non-blocking LocalCommand plus SimpleClient hand-rolled
Running / Result / Error poll loop.
- main() raises TimeoutError on deadline expiry to match its documented
behaviour.
- Exported from the pyrogue.examples barrel (examples/__init__.py).
- Runnable headlessly (python -m
pyrogue.examples._NonBlockingCommandExample); also used as the
literalinclude source for the docs polling-pattern snippet.
Documentation (docs/src/)
- New Sphinx page
pyrogue_tree/builtin_devices/long_running_operations.rst -- decision
rule, side-by-side comparison table (pr.Process vs
Command(nonBlocking=True)), parallel recipes, client-side polling
pattern, and an explicit "Interaction with linkTimeout and
requestStallTimeout" section explaining that linkTimeout is an idle
timeout and that requestStallTimeout defaults to disabled.
- versionadded:: 6.14.0 on the new page and on the nonBlocking
parameter in BaseCommand's docstring.
- :ref: cross-references inside BaseCommand.nonBlocking (_Command.py)
and VirtualClient.linkTimeout / requestStallTimeout (_Virtual.py) so
autoclass and help() surface the link next to the relevant parameter.
- Related Topics / What To Explore Next cross-links added to
process.rst, core/command.rst, core/local_command.rst,
client_interfaces/virtual.rst, client_interfaces/simple.rst, and
builtin_devices/index.rst so a #1243-style reader lands on the new
page from any entry point.
Tests
- tests/core/test_core_command_nonblocking_construction.py --
backward-compat construction, RemoteCommand(nonBlocking=True)
attaches cleanly, sibling-injection collision raises pr.NodeError.
- tests/interfaces/test_interfaces_command_nonblocking.py --
happy-path, re-entry rejection, Root.stop() clean-join (no .worker
thread survives shutdown), error path (str(exc) in Error,
Running=False, Result unchanged).
- tests/interfaces/test_interfaces_simple_client_nonblocking.py --
SimpleClient.exec returns in well under 1 s for a multi-second
non-blocking command; polls ApplyConfigRunning via full Top.Dev.*
paths.
Compatibility
- Default BaseCommand behaviour unchanged. nonBlocking=True is opt-in.
- No new C++ dependencies. Pure-Python worker, mirroring pr.Process.
- No changes to existing examples or downstream consumers
(python/pyrogue/examples/, python/pyrogue/pydm/) unless explicitly
migrated.
- Targeted at v6.14.0 -- does not block RC #1240 / v6.13.1.
There was a problem hiding this comment.
This looks like a very thorough implementation of this feature. One potential request: Would it make sense to make the blocking/non-blocking behaviour an argument to the call method itself? Whether or not a blocking return is desired is typically more a function of the context where it is called rather than the nature of the command itself. I think this is more similar to what was available in the EPICS interface as well.
|
Closing this for now (reviewing whether the feature is needed before moving forward) |
Summary
Closes the discoverability and ergonomics gap reported in #1243: a long-running
Commandinvoked over ZMQ (SimpleClient.exec/VirtualClient) ties up the client pastlinkTimeout/requestStallTimeout, andpr.Processwas the only documented escape hatch.This PR adds an opt-in
nonBlocking=Truekwarg onBaseCommandso a user with an existingpr.LocalCommand(function=long_thing)can flip one flag and stop fighting client timeouts.pr.ProcessandCommand(nonBlocking=True)are now the two canonical patterns for long-running work, documented side-by-side.What ships
Core API (
python/pyrogue/_Command.py,python/pyrogue/_Device.py)BaseCommand(nonBlocking=True)opt-in kwarg. Default behavior unchanged — synccall()returns the function's value as before.RemoteCommandexplicitly forwardsnonBlockingtoBaseCommand.__init__only; it is not in**kwargsand therefore cannot reachRemoteVariable.__init__.daemon=False, name{path}.worker). Auto-injects three siblingLocalVariables on the parentDevice:{name}Running(bool),{name}Result(last return value, sticky on success),{name}Error(last exception string, cleared at dispatch).self._thread.is_alive()), not the_runEnflag, closing the race where_stopWorkercleared the flag before joining.argand this is aLocalCommand, the resolved argument is cached intoself._defaultand an update is queued before worker dispatch, socmd.get()reflects the last invocation.parent._nodesfor{name}Running/{name}Result/{name}Errorbefore any mutation; raisespr.NodeErroron conflict so a third-sibling collision never leaves a half-injected state.Device._stop()joins non-blocking command workers before stopping managed interfaces, mirroringProcess._stopcalling_stopProcessfirst. Worker functions that touch interfaces or memory during teardown stay safe.Device._rootAttachedsnapshots_nodesbefore iterating so the sibling-variable injection done by childBaseCommand._rootAttachedcannot raiseRuntimeError: dictionary changed size during iteration._runWorker,_stopWorker,_stop) mirrorspr.Process— including the self-thread guard that prevents deadlock if the user function triggersRoot.stop()from within itself.Cookbook example (
python/pyrogue/examples/_NonBlockingCommandExample.py)ApplyConfignon-blockingLocalCommandplusSimpleClienthand-rolledRunning/Result/Errorpoll loop.main()raisesTimeoutErroron deadline expiry to match its documented behavior.pyrogue.examplesbarrel (examples/__init__.py).python -m pyrogue.examples._NonBlockingCommandExample); also used as theliteralincludesource for the docs polling-pattern snippet.Documentation (
docs/src/)pyrogue_tree/builtin_devices/long_running_operations.rst— decision rule, side-by-side comparison table (pr.ProcessvsCommand(nonBlocking=True)), parallel recipes, client-side polling pattern, and an explicit "Interaction with linkTimeout and requestStallTimeout" section explaining thatlinkTimeoutis an idle timeout and thatrequestStallTimeoutdefaults to disabled... versionadded:: 6.14.0on the new page and on thenonBlockingparameter inBaseCommand's docstring.:ref:cross-references insideBaseCommand.nonBlocking(_Command.py) andVirtualClient.linkTimeout/requestStallTimeout(_Virtual.py) soautoclassandhelp()surface the link next to the relevant parameter.process.rst,core/command.rst,core/local_command.rst,client_interfaces/virtual.rst,client_interfaces/simple.rst, andbuiltin_devices/index.rstso a Long running commands can trigger ZMQ timeouts #1243-style reader lands on the new page from any entry point.Tests
tests/core/test_core_command_nonblocking_construction.py— backward-compat construction,RemoteCommand(nonBlocking=True)attaches cleanly, sibling-injection collision raisespr.NodeError.tests/interfaces/test_interfaces_command_nonblocking.py— happy-path, re-entry rejection,Root.stop()clean-join (no.workerthread survives shutdown), error path (str(exc)inError,Running=False,Resultunchanged).tests/interfaces/test_interfaces_simple_client_nonblocking.py—SimpleClient.execreturns in well under 1 s for a multi-second non-blocking command; pollsApplyConfigRunningvia fullTop.Dev.*paths.Compatibility
BaseCommandbehavior unchanged.nonBlocking=Trueis opt-in.pr.Process.python/pyrogue/examples/,python/pyrogue/pydm/) unless explicitly migrated.