Skip to content

protocols/rssi: clamp peer-advertised connection parameters#1246

Closed
ruck314 wants to merge 1 commit into
pre-releasefrom
rssi-peer-param-clamp
Closed

protocols/rssi: clamp peer-advertised connection parameters#1246
ruck314 wants to merge 1 commit into
pre-releasefrom
rssi-peer-param-clamp

Conversation

@ruck314

@ruck314 ruck314 commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Hardens the rogue RSSI endpoint by clamping peer-advertised connection parameters to the locally legal range during the SYN/SYN-ACK handshake. This is defense-in-depth parity with the firmware-side validation added in surf #1427 (RssiConnFsm.vhd validPeerParams / clampWindowSize / clampBufferSize).

Controller::stateClosedWait() previously adopted the peer's advertised parameters verbatim. An out-of-range peer value could drive the controller into illegal state — most concretely, curMaxBuffers == 0 makes the applicationRx() outbound wait loop (while (txListCount_ >= curMaxBuffers_)) spin forever. The new Controller::clampPeerParams() clamps in place:

Parameter Clamp
curMaxBuffers_ [1, locMaxBuffers_]
curMaxSegment_ [HeaderSize(8), locMaxSegment_]
curCumAckTout_, curRetranTout_, curNullTout_ >= 1

Clamp, not reject — the connection is never torn down, preserving backward/forward compatibility with conformant peers. SLAC firmware never advertises invalid parameters (and surf#1427 makes the FW self-clamp), so in practice this only engages against a malformed/buggy peer, where it now logs a warning instead of hanging.

Testing

  • New tests/cpp/protocols/rssi/test_param_clamp.cpp (doctest, cpp-core/no-python) exercises every clamp branch plus the no-op case for legal input. Auto-run by the existing ctest -L cpp / -L no-python CI steps — no workflow changes needed.
  • Local: full ctest -L cpp (13/13) and tests/integration/test_rssi_loopback.py (4/4) pass — the loopback negotiates legal parameters, so the clamp is a verified no-op on the happy path (behavior identical to pre-release).
  • cpplint clean on changed files.

Context: audit of surf#1427

This change came out of an audit of surf#1427 (RSSI RTL fixes + cocotb regression suite). Findings:

  • The PR #1427 production RTL changes (RssiAxiLiteRegItf, RssiConnFsm, RssiCore, RssiMonitor, RssiRxFsm, RssiTxFsm) are valid by analysis and narrow. rogue's RSSI C++ is byte-identical to pre-release and requires no functional change to interoperate — the new strict FW receiver plus rogue's existing out-of-order buffering recover from every gap scenario via retransmit.
  • This clamp is the one defensible hardening surfaced by the audit. It is not required for interop; it is robustness only.
  • Note: surf#1427's validation is cocotb-vs-Python-oracle only. The RssiRxFsm registered-RAM payload length/timing rewrite in that PR is its highest-risk change and is worth confirming on hardware before merge.

Deferred: Candidate 2 (NULL-while-data-outstanding suppression) — intentionally not done

surf#1427 also makes the FW suppress NULL while its transmit buffer is non-empty (RssiTxFsm). A symmetric change in rogue (suppress NULL while txListCount_ > 0) was evaluated and deliberately deferred as high-risk / no-reward against currently deployed firmware:

  • No-op in the normal case. While data is outstanding and the peer is not busy, rogue retransmits the oldest unacked frame every ~20 ms (stateOpen()), which refreshes txTime_, so the ~333 ms NULL timer never fires anyway. Suppression changes nothing here.
  • Its only active effect is during a sustained remote-BUSY stall, where the periodic NULL is the sole keepalive. The firmware has a receive-side null-timeout; rogue does not (asymmetry — rogue infers peer death only via retransmit-max). Suppressing rogue's NULL there would let the FW (old and new) RST the link after ~1 null-timeout (~1 s default) of backpressure in the software→FPGA direction — a link bounce + reconnect, not a graceful retransmit recovery.
  • The benefit cannot occur against current firmware. The "NULL advances past lost DATA" failure it guards against requires a non-strict receiver; surf#1427's new RssiRxFsm is strict (drops duplicate/out-of-order) and rogue's receiver buffers out-of-order correctly.

Recommendation: revisit Candidate 2 once surf#1427 firmware is widespread in production. At that point the FW's periodic BUSY-ACK behavior (also added in #1427) makes the parity change low-risk to roll out.

Checklist

  • Reviewer confirms the clamp-vs-reject choice and the bounds.
  • Hardware/interop validation against surf#1427 firmware (out of scope here — no FW↔rogue co-sim exists).

The RSSI SYN/SYN-ACK handshake adopted the peer's advertised parameters
verbatim. An out-of-range peer value (zero outstanding segments, zero
timeouts, a sub-header-size segment, or a window larger than the locally
offered maximum) could drive the controller into illegal state -- for
example curMaxBuffers == 0 makes the applicationRx outbound wait loop
spin forever.

Add Controller::clampPeerParams() and apply it in stateClosedWait() after
the peer parameters are adopted, clamping to the locally legal range
rather than rejecting the connection. This preserves backward/forward
compatibility with conformant peers (including SLAC firmware, which never
advertises invalid parameters) while hardening against a malformed SYN.
Mirrors the firmware validPeerParams/clampWindowSize/clampBufferSize
checks in surf protocols/rssi/v1/rtl/RssiConnFsm.vhd.

Add a doctest covering the clamp helper (cpp-core, no-python), wired into
the existing tests/cpp ctest suite.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the rogue RSSI controller by clamping peer-advertised connection parameters (maxBuffers, maxSegment, and the three timeouts) to the locally legal range during SYN/SYN-ACK handshake. This is defense-in-depth parity with surf#1427's firmware-side validation, preventing pathological states such as curMaxBuffers == 0 causing the applicationRx() wait loop to spin forever. The clamp logs a warning rather than tearing down the connection, preserving compatibility with conformant peers.

Changes:

  • Add static Controller::clampPeerParams() helper and invoke it from stateClosedWait() between peer parameter adoption and timer conversion.
  • Add doctest-based unit tests covering the no-op path and each clamp branch (zero/oversized buffers, sub-header/oversized segment, zero timeouts, all-at-once).
  • Wire the new test directory through tests/cpp/protocols/CMakeLists.txt.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
include/rogue/protocols/rssi/Controller.h Declares the new static clampPeerParams() helper with detailed doxygen.
src/rogue/protocols/rssi/Controller.cpp Implements clampPeerParams() and calls it in stateClosedWait() after adopting peer SYN fields.
tests/cpp/protocols/rssi/test_param_clamp.cpp New doctest suite exercising each clamp branch and the conformant no-op case.
tests/cpp/protocols/rssi/CMakeLists.txt Registers the new test target with cpp-core/no-python labels.
tests/cpp/protocols/CMakeLists.txt Adds the rssi subdirectory to the protocols test build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ruck314 ruck314 marked this pull request as ready for review May 29, 2026 11:20
@ruck314 ruck314 requested a review from bengineerd May 29, 2026 11:36
@ruck314

ruck314 commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

Closing this for now (reviewing whether the feature is needed before moving forward)

@ruck314 ruck314 closed this May 31, 2026
@ruck314 ruck314 deleted the rssi-peer-param-clamp branch June 2, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants