Release Candidate v6.13.1 by ruck314 · Pull Request #1240 · slaclab/rogue

ruck314 · 2026-05-18T20:08:07Z

Description

…Frame

Override XDG_SESSION_TYPE to x11 before Qt initialization so Qt does not emit the noisy "Ignoring XDG_SESSION_TYPE=wayland on Gnome" warning. The override is conditional and only fires on Wayland sessions.

…mpatibility SqlReader used deprecated 1.x APIs (bound MetaData, autoload=True, list-wrapped select) and referenced an undefined self._conn attribute. Replace with 2.x-compatible patterns (unbound MetaData, autoload_with, engine.connect() context managers) that also work on 1.4+. Add round-trip tests that write via SqlLogger and read via SqlReader.

…ram in SqlReader

…1/V2 SSI ordering - Application / Transport: raw std::thread* member -> std::unique_ptr<std::thread>; remove manual new / delete. - setController(): wrap std::thread spawn in try/catch; roll back threadEn_ and cntl_ if construction throws so the instance stays in its retryable pre-call state. Idempotent guard so a second call is a no-op (without it, replacing a still-joinable unique_ptr<std::thread> would invoke std::terminate via ~thread()). - ControllerV1: move SSI setError(0x80) to BEFORE pushFrame() and tranFrame_[0].reset(), and apply it to tranFrame_[0] (the assembled frame) instead of tranFrame_[tmpDest]. Original code dereferenced a null shared_ptr for any tmpDest != 0, and silently dropped the SSI flag for tmpDest == 0. - ControllerV2: same setError ordering fix - apply before push + reset so the downstream slave observes the error flag instead of the call hitting a just-reset (null) slot.

…s atomic Concurrent first-callers could observe partially-initialised state. Declares Version::init()'s statics atomic and guards the one-time initialisation with std::call_once.

Remove tests/protocols/test_batcher_audit_repros.py — the file walked InverterV1.cpp / InverterV2.cpp source text looking for bounds-check tokens before the memcpy, rather than functionally exercising the patched code paths. Per maintainer feedback that style of test is too brittle to be useful as CI; the audit/guardrail intent is better expressed as an AI agent directive at PR-review time.

fix(version): Version::init() race fix

fix(pyrogue::interfaces): SQLAlchemy 1.4+/2.x compatibility for SqlReader

Suppress Qt Wayland warning in PyDM launcher

fix(batcher): InverterV1, InverterV2 bounds-check audit fixes + repros

fix(packetizer): RAII worker threads, exception-safe setController, V1/V2 SSI ordering

Add an opt-in `nonBlocking=True` kwarg on `BaseCommand` so long-running commands invoked over ZMQ (`SimpleClient.exec` / `VirtualClient`) no longer tie the client up past `linkTimeout` / `requestStallTimeout`. `pr.Process` and `Command(nonBlocking=True)` are now the two canonical patterns for long-running work, documented side-by-side. Core API (python/pyrogue/_Command.py, python/pyrogue/_Device.py) - BaseCommand(nonBlocking=True) opt-in kwarg. Default behaviour unchanged -- sync call() still returns the function's value. - RemoteCommand explicitly forwards nonBlocking to BaseCommand.__init__ only; it is not in **kwargs and therefore cannot reach RemoteVariable.__init__. - Worker thread per call (daemon=False, name "{path}.worker"). Auto-injects three sibling LocalVariables on the parent Device: {name}Running (bool), {name}Result (last return value, sticky on success), {name}Error (last exception string, cleared at dispatch). - Re-entry guard uses live thread aliveness (self._thread.is_alive()), not the _runEn flag, closing the race where _stopWorker cleared the flag before joining. - Sync-path parity: when the callback accepts an arg and this is a LocalCommand, the resolved argument is cached into self._default and an update is queued before worker dispatch, so cmd.get() reflects the last invocation. - Sibling-injection collision guard: pre-checks parent._nodes for {name}Running / {name}Result / {name}Error before any mutation; raises pr.NodeError on conflict so a third-sibling collision never leaves a half-injected state. - Device._stop() joins non-blocking command workers BEFORE stopping managed interfaces, mirroring Process._stop calling _stopProcess first. Worker functions that touch interfaces or memory during teardown stay safe. - Device._rootAttached snapshots _nodes before iterating so the sibling-variable injection done by child BaseCommand._rootAttached cannot raise RuntimeError: dictionary changed size during iteration. - Worker lifecycle (_runWorker, _stopWorker, _stop) mirrors pr.Process, including the self-thread guard that prevents deadlock if the user function triggers Root.stop() from within itself. Cookbook example (python/pyrogue/examples/_NonBlockingCommandExample.py) - ApplyConfig non-blocking LocalCommand plus SimpleClient hand-rolled Running / Result / Error poll loop. - main() raises TimeoutError on deadline expiry to match its documented behaviour. - Exported from the pyrogue.examples barrel (examples/__init__.py). - Runnable headlessly (python -m pyrogue.examples._NonBlockingCommandExample); also used as the literalinclude source for the docs polling-pattern snippet. Documentation (docs/src/) - New Sphinx page pyrogue_tree/builtin_devices/long_running_operations.rst -- decision rule, side-by-side comparison table (pr.Process vs Command(nonBlocking=True)), parallel recipes, client-side polling pattern, and an explicit "Interaction with linkTimeout and requestStallTimeout" section explaining that linkTimeout is an idle timeout and that requestStallTimeout defaults to disabled. - versionadded:: 6.14.0 on the new page and on the nonBlocking parameter in BaseCommand's docstring. - :ref: cross-references inside BaseCommand.nonBlocking (_Command.py) and VirtualClient.linkTimeout / requestStallTimeout (_Virtual.py) so autoclass and help() surface the link next to the relevant parameter. - Related Topics / What To Explore Next cross-links added to process.rst, core/command.rst, core/local_command.rst, client_interfaces/virtual.rst, client_interfaces/simple.rst, and builtin_devices/index.rst so a #1243-style reader lands on the new page from any entry point. Tests - tests/core/test_core_command_nonblocking_construction.py -- backward-compat construction, RemoteCommand(nonBlocking=True) attaches cleanly, sibling-injection collision raises pr.NodeError. - tests/interfaces/test_interfaces_command_nonblocking.py -- happy-path, re-entry rejection, Root.stop() clean-join (no .worker thread survives shutdown), error path (str(exc) in Error, Running=False, Result unchanged). - tests/interfaces/test_interfaces_simple_client_nonblocking.py -- SimpleClient.exec returns in well under 1 s for a multi-second non-blocking command; polls ApplyConfigRunning via full Top.Dev.* paths. Compatibility - Default BaseCommand behaviour unchanged. nonBlocking=True is opt-in. - No new C++ dependencies. Pure-Python worker, mirroring pr.Process. - No changes to existing examples or downstream consumers (python/pyrogue/examples/, python/pyrogue/pydm/) unless explicitly migrated. - Targeted at v6.14.0 -- does not block RC #1240 / v6.13.1.

Root.waitOnUpdate() previously called Queue.join(), which only returns when the unfinished-task counter reaches zero. updateGroup() is per-thread, so under sustained producer load (poll thread, listener-fired sets, UDP listeners) the counter never settles and the call could silently take minutes - long enough to trip the ZMQ client linkTimeout. Reported by Tristan on the LAT/SO rogue 6 deployments, where `with updateGroup(): waitOnUpdate()` occasionally took >2 min between the metadata-frame setup and the open_g3stream trigger. Replace Queue.join() with a barrier: push a threading.Event sentinel into the update queue and wait for the worker to set it. The worker recognizes the sentinel, signals it, calls task_done(), and continues. Items queued during the wait are not included - the wait is bounded by queue contents at the moment of the call, which is exactly the flush-before-next-set semantics callers actually need. Race guard: stop() can flip _running and enqueue the None stop sentinel between the _running check and the barrier put(). If the barrier ends up behind the stop sentinel the worker exits before reaching it, leaving barrier.wait() blocked forever. Wait with a short timeout and re-check _running so the call returns after stop(). Adds tests/core/test_wait_on_update_bounded.py (5 cases) covering idle queue, concurrent producer thread, the with updateGroup() user pattern, listener self-storm, and the listener-callback-drains contract. The listener-self-storm case re-fires the same variable from its own listener (gated by a stop event) so each callback queues another update from the worker thread, exercising the unbounded feedback loop the bounded barrier is meant to absorb. The two producer-thread cases hang (pytest 15s timeout) on unfixed code and pass in ~5s after the fix; full tests/core/ (273 tests) and tests/interfaces/ (116 + 5 skipped) green.

fix(pyrogue): bound waitOnUpdate() with a barrier sentinel

ruck314 · 2026-05-31T19:31:16Z

Closing this for now (reviewing whether the feature is needed before moving forward)

ruck314 and others added 26 commits May 6, 2026 09:18

fix(batcher): add frame-payload bounds checks to InverterV1/V2 accept…

b17f9e9

…Frame

Suppress Qt Wayland warning in PyDM launcher

f2cf2f4

Override XDG_SESSION_TYPE to x11 before Qt initialization so Qt does not emit the noisy "Ignoring XDG_SESSION_TYPE=wayland on Gnome" warning. The override is conditional and only fires on Wayland sessions.

fix(pyrogue::interfaces): add engine-None guards and remove unused pa…

33d251e

…ram in SqlReader

fix(version): guard Version::init() with call_once and declare static…

c585057

…s atomic Concurrent first-callers could observe partially-initialised state. Declares Version::init()'s statics atomic and guards the one-time initialisation with std::call_once.

Merge branch 'pre-release' into bug-audit/packetizer

61d097b

Merge branch 'pre-release' into bug-audit/batcher

958f7d8

Merge branch 'pre-release' into bug-audit/version

2b9f7f1

Merge branch 'pre-release' into XDG_SESSION_TYPE

3879558

Merge branch 'pre-release' into SQLAlchemy

28ba628

Merge branch 'pre-release' into SQLAlchemy

7e6c6d4

Merge branch 'pre-release' into XDG_SESSION_TYPE

fa40435

Merge branch 'pre-release' into bug-audit/version

1838b30

Merge branch 'pre-release' into bug-audit/batcher

6468ab9

Merge branch 'pre-release' into bug-audit/packetizer

8ad14b6

Merge pull request #1219 from slaclab/bug-audit/version

cded749

fix(version): Version::init() race fix

Merge branch 'pre-release' into SQLAlchemy

d1dd007

Merge pull request #1228 from slaclab/SQLAlchemy

23ad6e7

fix(pyrogue::interfaces): SQLAlchemy 1.4+/2.x compatibility for SqlReader

Merge branch 'pre-release' into XDG_SESSION_TYPE

f1561a3

Merge pull request #1227 from slaclab/XDG_SESSION_TYPE

9906f65

Suppress Qt Wayland warning in PyDM launcher

Merge branch 'pre-release' into bug-audit/batcher

aa457fd

Merge pull request #1214 from slaclab/bug-audit/batcher

7acbcf5

fix(batcher): InverterV1, InverterV2 bounds-check audit fixes + repros

Merge branch 'pre-release' into bug-audit/packetizer

5d0971f

Merge pull request #1203 from slaclab/bug-audit/packetizer

20cbb7e

fix(packetizer): RAII worker threads, exception-safe setController, V1/V2 SSI ordering

ruck314 mentioned this pull request May 28, 2026

feat: opt-in nonBlocking Command for long-running ZMQ operations (#1243) #1244

Closed

ruck314 added 2 commits May 28, 2026 00:13

Merge pull request #1245 from slaclab/bugfix/wait-on-update-hang

8bebe44

fix(pyrogue): bound waitOnUpdate() with a barrier sentinel

ruck314 closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release Candidate v6.13.1#1240

Release Candidate v6.13.1#1240
ruck314 wants to merge 28 commits into
mainfrom
pre-release

ruck314 commented May 18, 2026 •

edited

Loading

Uh oh!

ruck314 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ruck314 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

ruck314 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ruck314 commented May 18, 2026 •

edited

Loading