Skip to content

feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243)#1244

Closed
ruck314 wants to merge 2 commits into
pre-releasefrom
issues/1243
Closed

feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243)#1244
ruck314 wants to merge 2 commits into
pre-releasefrom
issues/1243

Conversation

@ruck314

@ruck314 ruck314 commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the discoverability and ergonomics gap reported in #1243: a long-running Command invoked over ZMQ (SimpleClient.exec / VirtualClient) ties up the client past linkTimeout / requestStallTimeout, and pr.Process was the only documented escape hatch.

This PR adds an opt-in nonBlocking=True kwarg on BaseCommand so a user with an existing pr.LocalCommand(function=long_thing) can flip one flag and stop fighting client timeouts. pr.Process and Command(nonBlocking=True) are now the two canonical patterns for long-running work, documented side-by-side.

What ships

Core API (python/pyrogue/_Command.py, python/pyrogue/_Device.py)

  • BaseCommand(nonBlocking=True) opt-in kwarg. Default behavior unchanged — sync call() returns the function's value as before.
  • RemoteCommand explicitly forwards nonBlocking to BaseCommand.__init__ only; it is not in **kwargs and therefore cannot reach RemoteVariable.__init__.
  • Worker thread per call (daemon=False, name {path}.worker). Auto-injects three sibling LocalVariables on the parent Device: {name}Running (bool), {name}Result (last return value, sticky on success), {name}Error (last exception string, cleared at dispatch).
  • Re-entry guard uses live thread aliveness (self._thread.is_alive()), not the _runEn flag, closing the race where _stopWorker cleared the flag before joining.
  • Sync-path parity: when the callback accepts an arg and this is a LocalCommand, the resolved argument is cached into self._default and an update is queued before worker dispatch, so cmd.get() reflects the last invocation.
  • Sibling-injection collision guard: pre-checks parent._nodes for {name}Running / {name}Result / {name}Error before any mutation; raises pr.NodeError on conflict so a third-sibling collision never leaves a half-injected state.
  • Device._stop() joins non-blocking command workers before stopping managed interfaces, mirroring Process._stop calling _stopProcess first. Worker functions that touch interfaces or memory during teardown stay safe.
  • Device._rootAttached snapshots _nodes before iterating so the sibling-variable injection done by child BaseCommand._rootAttached cannot raise RuntimeError: dictionary changed size during iteration.
  • Worker lifecycle (_runWorker, _stopWorker, _stop) mirrors pr.Process — including the self-thread guard that prevents deadlock if the user function triggers Root.stop() from within itself.

Cookbook example (python/pyrogue/examples/_NonBlockingCommandExample.py)

  • ApplyConfig non-blocking LocalCommand plus SimpleClient hand-rolled Running / Result / Error poll loop.
  • main() raises TimeoutError on deadline expiry to match its documented behavior.
  • Exported from the pyrogue.examples barrel (examples/__init__.py).
  • Runnable headlessly (python -m pyrogue.examples._NonBlockingCommandExample); also used as the literalinclude source for the docs polling-pattern snippet.

Documentation (docs/src/)

  • New Sphinx page pyrogue_tree/builtin_devices/long_running_operations.rst — decision rule, side-by-side comparison table (pr.Process vs Command(nonBlocking=True)), parallel recipes, client-side polling pattern, and an explicit "Interaction with linkTimeout and requestStallTimeout" section explaining that linkTimeout is an idle timeout and that requestStallTimeout defaults to disabled.
  • .. versionadded:: 6.14.0 on the new page and on the nonBlocking parameter in BaseCommand's docstring.
  • :ref: cross-references inside BaseCommand.nonBlocking (_Command.py) and VirtualClient.linkTimeout / requestStallTimeout (_Virtual.py) so autoclass and help() surface the link next to the relevant parameter.
  • "Related Topics" / "What To Explore Next" cross-links added to process.rst, core/command.rst, core/local_command.rst, client_interfaces/virtual.rst, client_interfaces/simple.rst, and builtin_devices/index.rst so a Long running commands can trigger ZMQ timeouts #1243-style reader lands on the new page from any entry point.

Tests

  • tests/core/test_core_command_nonblocking_construction.py — backward-compat construction, RemoteCommand(nonBlocking=True) attaches cleanly, sibling-injection collision raises pr.NodeError.
  • tests/interfaces/test_interfaces_command_nonblocking.py — happy-path, re-entry rejection, Root.stop() clean-join (no .worker thread survives shutdown), error path (str(exc) in Error, Running=False, Result unchanged).
  • tests/interfaces/test_interfaces_simple_client_nonblocking.pySimpleClient.exec returns in well under 1 s for a multi-second non-blocking command; polls ApplyConfigRunning via full Top.Dev.* paths.

Compatibility

  • Default BaseCommand behavior unchanged. nonBlocking=True is opt-in.
  • No new C++ dependencies. Pure-Python worker, mirroring pr.Process.
  • No changes to existing examples or downstream consumers (python/pyrogue/examples/, python/pyrogue/pydm/) unless explicitly migrated.
  • Targeted at v6.14.0 — does not block RC Release Candidate v6.13.1 #1240 / v6.13.1.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in nonBlocking=True execution mode for pyrogue Commands to support long-running operations over ZMQ without tying up clients, using a worker-thread + polling model, plus documentation, examples, and integration/regression tests.

Changes:

  • Introduces BaseCommand(nonBlocking=True) with per-call worker thread and auto-injected {name}Running/{name}Result/{name}Error sibling LocalVariables for client polling.
  • Hooks shutdown/attachment behavior to support injected siblings and worker joining (Device._rootAttached iteration snapshot; Device._stop joins command workers).
  • Adds a cookbook example and a new Sphinx page describing long-running operation patterns, plus integration tests for VirtualClient and SimpleClient.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/interfaces/test_zmq_client.py Comment tweaks clarifying concurrent ZMQ REQ corruption repro/fix expectations.
tests/interfaces/test_interfaces_simple_client_nonblocking.py New integration test validating SimpleClient.exec returns immediately and polling variables update.
tests/interfaces/test_interfaces_command_nonblocking.py New end-to-end ZMQ tests for blocking baseline vs non-blocking behavior, re-entry policy, shutdown join, error path.
tests/core/test_core_command_nonblocking_construction.py New core tests for construction, sibling injection, and collision handling.
python/pyrogue/interfaces/_Virtual.py Docstring cross-references to new long-running operations page.
python/pyrogue/examples/_NonBlockingCommandExample.py New runnable cookbook example demonstrating polling loop for a non-blocking command.
python/pyrogue/examples/init.py Exports the new non-blocking command example.
python/pyrogue/_Device.py Ensures safe _rootAttached iteration when nodes are injected; joins command workers during stop.
python/pyrogue/_Command.py Implements nonBlocking worker-thread dispatch, status sibling injection, and stop/join support.
docs/src/pyrogue_tree/core/local_command.rst Adds “What to explore next” link to long-running operations docs.
docs/src/pyrogue_tree/core/command.rst Adds “What to explore next” link to long-running operations docs.
docs/src/pyrogue_tree/client_interfaces/virtual.rst Adds note + related-topic link pointing to long-running operations guidance.
docs/src/pyrogue_tree/client_interfaces/simple.rst Adds related-topic link pointing to long-running operations guidance.
docs/src/pyrogue_tree/builtin_devices/process.rst Adds related-topic link to long-running operations comparison page.
docs/src/pyrogue_tree/builtin_devices/long_running_operations.rst New documentation page comparing pr.Process vs Command(nonBlocking=True) with recipes.
docs/src/pyrogue_tree/builtin_devices/index.rst Adds navigation entries pointing to the new long-running operations page.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/pyrogue/_Command.py
Comment thread python/pyrogue/_Command.py
Comment thread python/pyrogue/_Device.py
Comment thread docs/src/pyrogue_tree/builtin_devices/long_running_operations.rst Outdated
Comment thread docs/src/pyrogue_tree/builtin_devices/long_running_operations.rst Outdated
Comment thread python/pyrogue/examples/_NonBlockingCommandExample.py Outdated
Comment thread tests/interfaces/test_interfaces_command_nonblocking.py Outdated
ruck314 added a commit that referenced this pull request May 28, 2026
- _Command.__call__ now uses live thread aliveness as the re-entry guard
  instead of self._runEn, closing the race where _stopWorker cleared the
  flag before joining and let a second worker spawn while the first was
  still alive.
- Mirror the sync _doFunc path: cache the resolved argument into
  self._default and queue an update before dispatching the worker, so
  cmd.get() reflects the last argument for non-blocking calls.
- Device._stop joins non-blocking command workers BEFORE stopping
  managed interfaces, mirroring Process._stop calling _stopProcess
  before Device._stop.  This keeps interfaces/memory alive for worker
  functions that touch them during teardown.
- examples/_NonBlockingCommandExample.main now raises TimeoutError on
  deadline expiry, matching its documented behavior.
- long_running_operations.rst overview no longer claims linkTimeout
  closes the connection; explains linkTimeout's idle semantics and
  the role of requestStallTimeout.
- long_running_operations.rst comparison row clarifies that
  {name}Running toggles per call, {name}Error clears at dispatch, and
  {name}Result is sticky on success.
- test_root_stop_joins_in_flight_worker drops the over-strict
  post_threads <= pre_threads assertion that could capture unrelated
  background threads; the .worker leak check remains.
@ruck314 ruck314 requested a review from Copilot May 28, 2026 04:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

@ruck314 ruck314 changed the title Non-blocking Command for long-running ZMQ operations (#1243) feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243) May 28, 2026
Add an opt-in `nonBlocking=True` kwarg on `BaseCommand` so long-running
commands invoked over ZMQ (`SimpleClient.exec` / `VirtualClient`) no
longer tie the client up past `linkTimeout` / `requestStallTimeout`.
`pr.Process` and `Command(nonBlocking=True)` are now the two canonical
patterns for long-running work, documented side-by-side.

Core API (python/pyrogue/_Command.py, python/pyrogue/_Device.py)

- BaseCommand(nonBlocking=True) opt-in kwarg.  Default behaviour
  unchanged -- sync call() still returns the function's value.
- RemoteCommand explicitly forwards nonBlocking to BaseCommand.__init__
  only; it is not in **kwargs and therefore cannot reach
  RemoteVariable.__init__.
- Worker thread per call (daemon=False, name "{path}.worker").
  Auto-injects three sibling LocalVariables on the parent Device:
  {name}Running (bool), {name}Result (last return value, sticky on
  success), {name}Error (last exception string, cleared at dispatch).
- Re-entry guard uses live thread aliveness (self._thread.is_alive()),
  not the _runEn flag, closing the race where _stopWorker cleared the
  flag before joining.
- Sync-path parity: when the callback accepts an arg and this is a
  LocalCommand, the resolved argument is cached into self._default and
  an update is queued before worker dispatch, so cmd.get() reflects
  the last invocation.
- Sibling-injection collision guard: pre-checks parent._nodes for
  {name}Running / {name}Result / {name}Error before any mutation;
  raises pr.NodeError on conflict so a third-sibling collision never
  leaves a half-injected state.
- Device._stop() joins non-blocking command workers BEFORE stopping
  managed interfaces, mirroring Process._stop calling _stopProcess
  first.  Worker functions that touch interfaces or memory during
  teardown stay safe.
- Device._rootAttached snapshots _nodes before iterating so the
  sibling-variable injection done by child BaseCommand._rootAttached
  cannot raise RuntimeError: dictionary changed size during iteration.
- Worker lifecycle (_runWorker, _stopWorker, _stop) mirrors pr.Process,
  including the self-thread guard that prevents deadlock if the user
  function triggers Root.stop() from within itself.

Cookbook example (python/pyrogue/examples/_NonBlockingCommandExample.py)

- ApplyConfig non-blocking LocalCommand plus SimpleClient hand-rolled
  Running / Result / Error poll loop.
- main() raises TimeoutError on deadline expiry to match its documented
  behaviour.
- Exported from the pyrogue.examples barrel (examples/__init__.py).
- Runnable headlessly (python -m
  pyrogue.examples._NonBlockingCommandExample); also used as the
  literalinclude source for the docs polling-pattern snippet.

Documentation (docs/src/)

- New Sphinx page
  pyrogue_tree/builtin_devices/long_running_operations.rst -- decision
  rule, side-by-side comparison table (pr.Process vs
  Command(nonBlocking=True)), parallel recipes, client-side polling
  pattern, and an explicit "Interaction with linkTimeout and
  requestStallTimeout" section explaining that linkTimeout is an idle
  timeout and that requestStallTimeout defaults to disabled.
- versionadded:: 6.14.0 on the new page and on the nonBlocking
  parameter in BaseCommand's docstring.
- :ref: cross-references inside BaseCommand.nonBlocking (_Command.py)
  and VirtualClient.linkTimeout / requestStallTimeout (_Virtual.py) so
  autoclass and help() surface the link next to the relevant parameter.
- Related Topics / What To Explore Next cross-links added to
  process.rst, core/command.rst, core/local_command.rst,
  client_interfaces/virtual.rst, client_interfaces/simple.rst, and
  builtin_devices/index.rst so a #1243-style reader lands on the new
  page from any entry point.

Tests

- tests/core/test_core_command_nonblocking_construction.py --
  backward-compat construction, RemoteCommand(nonBlocking=True)
  attaches cleanly, sibling-injection collision raises pr.NodeError.
- tests/interfaces/test_interfaces_command_nonblocking.py --
  happy-path, re-entry rejection, Root.stop() clean-join (no .worker
  thread survives shutdown), error path (str(exc) in Error,
  Running=False, Result unchanged).
- tests/interfaces/test_interfaces_simple_client_nonblocking.py --
  SimpleClient.exec returns in well under 1 s for a multi-second
  non-blocking command; polls ApplyConfigRunning via full Top.Dev.*
  paths.

Compatibility

- Default BaseCommand behaviour unchanged.  nonBlocking=True is opt-in.
- No new C++ dependencies.  Pure-Python worker, mirroring pr.Process.
- No changes to existing examples or downstream consumers
  (python/pyrogue/examples/, python/pyrogue/pydm/) unless explicitly
  migrated.
- Targeted at v6.14.0 -- does not block RC #1240 / v6.13.1.
@ruck314 ruck314 marked this pull request as ready for review May 28, 2026 04:55

@tristpinsm tristpinsm left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very thorough implementation of this feature. One potential request: Would it make sense to make the blocking/non-blocking behaviour an argument to the call method itself? Whether or not a blocking return is desired is typically more a function of the context where it is called rather than the nature of the command itself. I think this is more similar to what was available in the EPICS interface as well.

@ruck314

ruck314 commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

Closing this for now (reviewing whether the feature is needed before moving forward)

@ruck314 ruck314 closed this May 31, 2026
@ruck314 ruck314 deleted the issues/1243 branch June 2, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants