Release Candidate v6.13.1#1240
Closed
ruck314 wants to merge 28 commits into
Closed
Conversation
Override XDG_SESSION_TYPE to x11 before Qt initialization so Qt does not emit the noisy "Ignoring XDG_SESSION_TYPE=wayland on Gnome" warning. The override is conditional and only fires on Wayland sessions.
…mpatibility SqlReader used deprecated 1.x APIs (bound MetaData, autoload=True, list-wrapped select) and referenced an undefined self._conn attribute. Replace with 2.x-compatible patterns (unbound MetaData, autoload_with, engine.connect() context managers) that also work on 1.4+. Add round-trip tests that write via SqlLogger and read via SqlReader.
…1/V2 SSI ordering - Application / Transport: raw std::thread* member -> std::unique_ptr<std::thread>; remove manual new / delete. - setController(): wrap std::thread spawn in try/catch; roll back threadEn_ and cntl_ if construction throws so the instance stays in its retryable pre-call state. Idempotent guard so a second call is a no-op (without it, replacing a still-joinable unique_ptr<std::thread> would invoke std::terminate via ~thread()). - ControllerV1: move SSI setError(0x80) to BEFORE pushFrame() and tranFrame_[0].reset(), and apply it to tranFrame_[0] (the assembled frame) instead of tranFrame_[tmpDest]. Original code dereferenced a null shared_ptr for any tmpDest != 0, and silently dropped the SSI flag for tmpDest == 0. - ControllerV2: same setError ordering fix - apply before push + reset so the downstream slave observes the error flag instead of the call hitting a just-reset (null) slot.
…s atomic Concurrent first-callers could observe partially-initialised state. Declares Version::init()'s statics atomic and guards the one-time initialisation with std::call_once.
Remove tests/protocols/test_batcher_audit_repros.py — the file walked InverterV1.cpp / InverterV2.cpp source text looking for bounds-check tokens before the memcpy, rather than functionally exercising the patched code paths. Per maintainer feedback that style of test is too brittle to be useful as CI; the audit/guardrail intent is better expressed as an AI agent directive at PR-review time.
fix(version): Version::init() race fix
fix(pyrogue::interfaces): SQLAlchemy 1.4+/2.x compatibility for SqlReader
Suppress Qt Wayland warning in PyDM launcher
fix(batcher): InverterV1, InverterV2 bounds-check audit fixes + repros
fix(packetizer): RAII worker threads, exception-safe setController, V1/V2 SSI ordering
ruck314
added a commit
that referenced
this pull request
May 28, 2026
Add an opt-in `nonBlocking=True` kwarg on `BaseCommand` so long-running
commands invoked over ZMQ (`SimpleClient.exec` / `VirtualClient`) no
longer tie the client up past `linkTimeout` / `requestStallTimeout`.
`pr.Process` and `Command(nonBlocking=True)` are now the two canonical
patterns for long-running work, documented side-by-side.
Core API (python/pyrogue/_Command.py, python/pyrogue/_Device.py)
- BaseCommand(nonBlocking=True) opt-in kwarg. Default behaviour
unchanged -- sync call() still returns the function's value.
- RemoteCommand explicitly forwards nonBlocking to BaseCommand.__init__
only; it is not in **kwargs and therefore cannot reach
RemoteVariable.__init__.
- Worker thread per call (daemon=False, name "{path}.worker").
Auto-injects three sibling LocalVariables on the parent Device:
{name}Running (bool), {name}Result (last return value, sticky on
success), {name}Error (last exception string, cleared at dispatch).
- Re-entry guard uses live thread aliveness (self._thread.is_alive()),
not the _runEn flag, closing the race where _stopWorker cleared the
flag before joining.
- Sync-path parity: when the callback accepts an arg and this is a
LocalCommand, the resolved argument is cached into self._default and
an update is queued before worker dispatch, so cmd.get() reflects
the last invocation.
- Sibling-injection collision guard: pre-checks parent._nodes for
{name}Running / {name}Result / {name}Error before any mutation;
raises pr.NodeError on conflict so a third-sibling collision never
leaves a half-injected state.
- Device._stop() joins non-blocking command workers BEFORE stopping
managed interfaces, mirroring Process._stop calling _stopProcess
first. Worker functions that touch interfaces or memory during
teardown stay safe.
- Device._rootAttached snapshots _nodes before iterating so the
sibling-variable injection done by child BaseCommand._rootAttached
cannot raise RuntimeError: dictionary changed size during iteration.
- Worker lifecycle (_runWorker, _stopWorker, _stop) mirrors pr.Process,
including the self-thread guard that prevents deadlock if the user
function triggers Root.stop() from within itself.
Cookbook example (python/pyrogue/examples/_NonBlockingCommandExample.py)
- ApplyConfig non-blocking LocalCommand plus SimpleClient hand-rolled
Running / Result / Error poll loop.
- main() raises TimeoutError on deadline expiry to match its documented
behaviour.
- Exported from the pyrogue.examples barrel (examples/__init__.py).
- Runnable headlessly (python -m
pyrogue.examples._NonBlockingCommandExample); also used as the
literalinclude source for the docs polling-pattern snippet.
Documentation (docs/src/)
- New Sphinx page
pyrogue_tree/builtin_devices/long_running_operations.rst -- decision
rule, side-by-side comparison table (pr.Process vs
Command(nonBlocking=True)), parallel recipes, client-side polling
pattern, and an explicit "Interaction with linkTimeout and
requestStallTimeout" section explaining that linkTimeout is an idle
timeout and that requestStallTimeout defaults to disabled.
- versionadded:: 6.14.0 on the new page and on the nonBlocking
parameter in BaseCommand's docstring.
- :ref: cross-references inside BaseCommand.nonBlocking (_Command.py)
and VirtualClient.linkTimeout / requestStallTimeout (_Virtual.py) so
autoclass and help() surface the link next to the relevant parameter.
- Related Topics / What To Explore Next cross-links added to
process.rst, core/command.rst, core/local_command.rst,
client_interfaces/virtual.rst, client_interfaces/simple.rst, and
builtin_devices/index.rst so a #1243-style reader lands on the new
page from any entry point.
Tests
- tests/core/test_core_command_nonblocking_construction.py --
backward-compat construction, RemoteCommand(nonBlocking=True)
attaches cleanly, sibling-injection collision raises pr.NodeError.
- tests/interfaces/test_interfaces_command_nonblocking.py --
happy-path, re-entry rejection, Root.stop() clean-join (no .worker
thread survives shutdown), error path (str(exc) in Error,
Running=False, Result unchanged).
- tests/interfaces/test_interfaces_simple_client_nonblocking.py --
SimpleClient.exec returns in well under 1 s for a multi-second
non-blocking command; polls ApplyConfigRunning via full Top.Dev.*
paths.
Compatibility
- Default BaseCommand behaviour unchanged. nonBlocking=True is opt-in.
- No new C++ dependencies. Pure-Python worker, mirroring pr.Process.
- No changes to existing examples or downstream consumers
(python/pyrogue/examples/, python/pyrogue/pydm/) unless explicitly
migrated.
- Targeted at v6.14.0 -- does not block RC #1240 / v6.13.1.
Root.waitOnUpdate() previously called Queue.join(), which only returns when the unfinished-task counter reaches zero. updateGroup() is per-thread, so under sustained producer load (poll thread, listener-fired sets, UDP listeners) the counter never settles and the call could silently take minutes - long enough to trip the ZMQ client linkTimeout. Reported by Tristan on the LAT/SO rogue 6 deployments, where `with updateGroup(): waitOnUpdate()` occasionally took >2 min between the metadata-frame setup and the open_g3stream trigger. Replace Queue.join() with a barrier: push a threading.Event sentinel into the update queue and wait for the worker to set it. The worker recognizes the sentinel, signals it, calls task_done(), and continues. Items queued during the wait are not included - the wait is bounded by queue contents at the moment of the call, which is exactly the flush-before-next-set semantics callers actually need. Race guard: stop() can flip _running and enqueue the None stop sentinel between the _running check and the barrier put(). If the barrier ends up behind the stop sentinel the worker exits before reaching it, leaving barrier.wait() blocked forever. Wait with a short timeout and re-check _running so the call returns after stop(). Adds tests/core/test_wait_on_update_bounded.py (5 cases) covering idle queue, concurrent producer thread, the with updateGroup() user pattern, listener self-storm, and the listener-callback-drains contract. The listener-self-storm case re-fires the same variable from its own listener (gated by a stop event) so each callback queues another update from the worker thread, exercising the unbounded feedback loop the bounded barrier is meant to absorb. The two producer-thread cases hang (pytest 15s timeout) on unfixed code and pass in ~5s after the fix; full tests/core/ (273 tests) and tests/interfaces/ (116 + 5 skipped) green.
fix(pyrogue): bound waitOnUpdate() with a barrier sentinel
Contributor
Author
|
Closing this for now (reviewing whether the feature is needed before moving forward) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description