Fix CLIENT-mode runlevel/sync and rec ring buffer wrap#183
Merged
Conversation
* getRunLevel() falls back to the SYSINFO mirror in shmem so CLIENT sessions report the device runlevel even without a fresh SYSREP packet flowing through their receive ring (the STANDALONE owner only sees SYSREPs during its handshake). * sync() CLIENT path uses getRunLevel() for the no-op runlevel set (previously sent runlevel=0 from the stale atomic) and waits specifically on cbPKTTYPE_SYSREPRUNLEV (0x12) via a sticky received_sysrepRunlev flag, so periodic 0x10 SYSREP heartbeats from nPlayServer can no longer falsely satisfy the wait. * writeToReceiveBuffer pads end-of-buffer wrap gaps with a synthetic chid=0/type=0/dlen!=0 marker so CLIENT readers can advance through the gap cleanly. Previously the reader had no way to detect the gap and would land inside it after the first wrap, decoding garbage as fake SYSREP packets and emitting spurious runlevel-change events. Adds release/acquire ordering on head_index for ARM weak-memory correctness. * pycbsdk regression tests in test_client_mode.py covering CLIENT runlevel, sync correctness, and packet integrity through a forced rec-buffer wrap. * simple_device.cpp: optional duration argument and prints standalone/protocol/proc-ident/runlevel after session creation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The release/acquire ordering on rec ring buffer head_index/head_wrap used GCC's __atomic_* builtins, which MSVC doesn't expose. Wrap the four call sites in small inline helpers that branch between __atomic_* on GCC/Clang and std::atomic_thread_fence + volatile load/store on MSVC. Pre-existing __atomic_* uses in the xmt buffer were already #ifdef _WIN32-guarded with InterlockedExchange, so this just brings the rec-buffer additions to parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent bugs that combined to make CLIENT-mode sessions unreliable on long-running streams. Each is fixable in isolation; together they make
pycbsdkandsimple_deviceproduce correct runlevel, workingsync(), and clean packet streams across rec ring buffer wraps.Bugs and fixes
1.
getRunLevel()returned 0 in CLIENT modeSymptom:
pycbsdk/examples/session_info.py NPLAY(andclient_session.runlevelin user code) reported0even though the device was running.simple_device NPLAYattached as CLIENT showedRunlevel: 0 (UNKNOWN).Cause:
SdkSession::getRunLevel()only read the per-sessiondevice_runlevelatomic, which is updated by the receive thread when a SYSREP packet flows through that session's receive ring. Devices only emit SYSREP in response to runlevel-set commands, so once the STANDALONE owner finished its handshake, no further SYSREPs flowed and CLIENT-sidedevice_runlevelstayed at 0 indefinitely.Fix:
getRunLevel()falls back togetSysInfo()->runlevel(the SYSINFO mirror in shmem that the STANDALONE owner writes viasetSysInfo()on every received SYSREP).2. CLIENT-mode
sync()raced with SYSREP heartbeatsSymptom:
sync()could return before the device had actually processed prior config packets, so subsequent read-back saw stale state. Worse: it sent SYSSETRUNLEV withrunlevel=0(stale atomic), which can perturb device state.Cause:
sync()readdevice_runlevel.load()directly for the "no-op" runlevel set. In CLIENT mode that's still 0.sync()waited on any 0x10..0x1F SYSREP via the booleanreceived_sysrepflag. Periodic SYSREP (0x10) heartbeats from nPlayServer satisfied the wait before the actual SYSREPRUNLEV (0x12) reply arrived.Fix:
sync()usesgetRunLevel()(with the new shmem fallback) forcurrent.received_sysrepRunlevflag inImpl, set only when the receive thread sees a 0x12 packet.setSystemRunLevelresets it before send.waitForSysrepaccepts an optionalexpected_typeargument; CLIENT-pathsync()passescbPKTTYPE_SYSREPRUNLEV. Sticky semantics are race-free against later 0x10 heartbeats arriving before the waiter wakes.3. Rec ring buffer wrap left an unmarked gap
Symptom: ~6–17 s into any CLIENT session, the receive thread started decoding garbage as fake SYSREP-family packets, firing a flood of
[runlevel change]events with nonsense runlevels and inflating delivered-packet counts. Once misaligned, the reader stayed misaligned for the rest of the session.Cause: When the writer would not fit the next packet at
head, it wrapped to offset 0 and incrementedhead_wrap— but left a 1+ dword gap between the previous packet's end andbuflenwith no marker. The reader had no way to know about this gap, so it tried to read a "packet" from inside it, got garbagedlen, advancedtailby some incorrect amount, and stayed off-aligned thereafter.Fix:
writeToReceiveBufferpads the gap with a synthetic wrap-marker packet (chid=0, type=0, dlen != 0— a combination no real packet uses). The marker fills the gap exactly so the reader can step over it cleanly. Wrap policy also avoids leaving 1..3-dword gaps that can't fit a marker header.readReceiveBufferrecognizes the marker and silently advancestailindexpast it without delivering the marker to user callbacks.__atomic_store_n(..., __ATOMIC_RELEASE)onhead_indexand__atomic_load_n(..., __ATOMIC_ACQUIRE)on the reader side so packet bytes are guaranteed visible before the consumer sees the new head index. Required for correctness on ARM/Apple Silicon weak memory.Diagnostics path
The garbage runlevel symptom drove the investigation. A debug print added to
readReceiveBuffercaught the first misaligned read and showedtailindex=1 tailwrap=1immediately after the first wrap — confirming the reader landed 1 dword off zero, not at offset 0. Tracing the writer revealed the unmarked gap.Backward compatibility
chid=0, type=0, dlen != 0. No real packet uses this combination (heartbeats havechid=cbPKTCHAN_CONFIGURATION=0x8000; group sample packets havechid=0withtypein 1..6).