perf: device-task hot-path cleanups and dedicated rclrs spin thread by gsokoll · Pull Request #13 · greenforge-labs/oxide_gnss

gsokoll · 2026-05-13T10:37:56Z

Four perf commits plus a CHANGELOG entry for v0.2.0. Reduces release binary size by ~22% and steady-state CPU by 35–50% across all five user-facing modes. No public API or topic-content changes.

Commits

dafab66 perf: tactical CPU/memory optimisations (release profile, atomics, buffers)
ba01b89 perf: drop internal-only UBX payloads from the device channel
c3f7a05 perf: eliminate remaining hot-path clones in the device task
d52a97b perf: run the rclrs executor on a dedicated thread instead of polling
642ce62 docs: v0.2.0 CHANGELOG entry

What changed

Device-task hot path (dafab66, ba01b89, c3f7a05). An explicit [profile.release] block (fat LTO, codegen-units = 1, strip = "debuginfo") accounts for the binary size reduction. The clone-elimination commits remove ~9 per-epoch heap operations at 5 Hz worst-case load: internal-only UBX variants (NAV-COV, NAV-POSECEF, SEC-SIGLOG, RXM-COR, MON-COMMS/HW/RF, FixTypeChanged) no longer cross the mpsc channel; the remaining heavy payloads (SatInfo, SecSig, RelPosNed, SurveyIn) are moved into the channel send rather than cloned; seven write-only retained-message fields in IntegrityAggregator were removed. Other tactical: Vec::with_capacity in the UBX parsers, AtomicU64/U32 counter relocation out from under the async Mutex in DeviceTask/NtripTask, serial read buffer 1 KiB → 4 KiB, tokio features trimmed from "full" to ["rt-multi-thread", "macros", "sync", "time", "signal"].

Dedicated rclrs spin thread (d52a97b). Replaces the polling loop driving the rclrs executor (executor.spin(SpinOptions::spin_once().timeout(100ms)) + sleep(10ms) on the main tokio thread) with a dedicated OS thread running spin(SpinOptions::default()), which blocks in rcl_wait and returns the instant a callback is ready. Shutdown uses SpinOptions::until_promise_resolved (the rclrs mechanism that flips halt_spinning and triggers the wait set's guard conditions to wake), wired to a tokio oneshot dropped on shutdown. Effect: the main tokio thread no longer wakes ~9 times per second to drain an empty wait set, and ROS control-plane callbacks (parameter events, service calls, timers) are no longer bounded by the polling cycle. The publish path is unaffected — publisher.publish() is called directly from the tokio side — so data-plane fix-to-subscriber latency is unchanged.

Headline numbers (vs v0.1.0)

Measured across 30–60 min runs at 5 Hz worst-case load (all four constellations dual-band, every feature each mode allows), live hardware A/B on the same antennas:

Binary size: −22.3%
Process memory footprint (peak resident set size): −1.8% to −3.6% (smallest in moving_base, largest in mb_rover base / rover_ntrip)
Steady-state CPU: combining both halves of the work, −35% to −50% across modes; standalone is now ~1% CPU
RSS slope after warmup: 2.9–14 kB/min in every run, candidate and baseline within ~2× of each other — heap noise, not growth
/fix e2e p50: unchanged in standalone/moving_base; rover_ntrip improved (some of which is rmw_zenoh delivery variance run-to-run); tail (p99/max) trends favourable
Output: topic counts match within ±1 message over 30–60 min; no new /rosout WARN/ERROR

All 183 library tests pass unchanged.

Modes covered

standalone, rover_ntrip, moving_base, moving_base_rover, rover_radio. static_base was not exercised — none of these commits touch the survey-in or RTCM-output code paths.

…ffers) Bundle of low-risk changes: - Cargo.toml: add [profile.release] with fat LTO, codegen-units = 1, strip = "debuginfo" for ~5–15% runtime and ~30–60% binary size wins. Trim tokio features from "full" to only the set actually used (rt-multi-thread, macros, sync, time, signal). - UBX parsers: add Vec::with_capacity in NAV-SAT, SEC-SIG, SEC-SIGLOG and MON-COMMS parse paths to avoid the usual 4→8→16→32→64 growth cascade per message. - Device/NTRIP task state: split internal state into a Mutex-guarded struct plus sibling AtomicU64/AtomicU32 counters. Hot-path counter increments (messages_received, rtcm_bytes_injected, bytes_received, messages_forwarded, connection_count) become lock-free fetch_add. The public snapshot type is unchanged. - Device task: bump the serial read buffer from 1 KiB to 4 KiB so that a single USB-CDC frame carrying NAV-SAT + RTCM doesn't force multiple read syscalls and partial-frame parser state. All 183 lib tests pass under cargo test --features ros2.

The device task's UbxProcessResult contains messages that are consumed purely by the integrity aggregator (NAV-COV, NAV-POSECEF, SEC-SIGLOG, RXM-COR, MON-COMMS, MON-HW, MON-RF, and fix-type transitions). They were being cloned, enum-wrapped, sent across the mpsc channel, and then dropped by the forwarding bridge because into_gnss_message() returned None for those variants. Handle them in place: call integrity.update_*() and skip the channel send. This removes 7-8 clones and the corresponding channel sends per GNSS epoch (~70-160 heap operations/sec at 10 Hz) and shrinks both DeviceMessage and the main.rs forwarding bridge. into_gnss_message() now returns GnssMessage directly. The forwarding bridge becomes unconditional. No ROS topic content changes because none of the removed variants was ever published.

After filter-before-send (previous commit), the four heavy payloads that still crossed mpsc::channel — SatInfo, SecSig, RelPosNed, SurveyIn — were being cloned by the device task before being sent. Move ownership instead: update the integrity aggregator with a borrow, then send the value by move. While auditing: the IntegrityAggregator kept eight Option<T> "last_*" fields, but only `last_nav_pl` is ever read back (by compute()). The other seven were write-only dead retention, each adding a per-epoch clone on top of the one already being sent through the channel. Drop those fields and their assignments; the update_*() functions still take the struct by reference to read the specific metric they care about. Net effect per GNSS epoch: zero deep clones in process_serial_data for the four heavy message types, and zero retention clones in the integrity aggregator for the seven unused snapshots.

The old main loop polled `executor.spin(spin_once().timeout(100ms))` with a `sleep(10ms)` between iterations — waking the main tokio thread ~9×/sec and churning the rcl wait set for no reason. Move the executor onto its own OS thread running `spin(SpinOptions::default())`: it blocks in `rcl_wait` and wakes the instant a callback is ready. Shutdown goes through `SpinOptions::until_promise_resolved` (the mechanism that flips the executor's `halt_spinning` flag *and* triggers its guard conditions to wake a blocked wait set), wired to a tokio oneshot that's dropped on shutdown. Effect: at low message rates the polling loop was costing roughly half the steady-state CPU — that goes away. Control-plane callbacks (parameter events, service calls, timers) are serviced immediately rather than within the ~110ms polling cycle. The data path is publish-only and never went through the executor, so publish→subscribe latency is unchanged. All 183 lib tests pass; clean startup/spin/publish/shutdown verified against a live F9P in standalone mode.

gsokoll added 4 commits May 13, 2026 20:48

gsokoll force-pushed the perf/rclrs-spin-thread branch from 4f199f5 to 7f9a29b Compare May 13, 2026 10:48

docs: v0.2.0 CHANGELOG entry

9327023

gsokoll force-pushed the perf/rclrs-spin-thread branch from 4451074 to 9327023 Compare May 13, 2026 10:57

gsokoll merged commit 69232c9 into master May 13, 2026
7 checks passed

gsokoll deleted the perf/rclrs-spin-thread branch May 13, 2026 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: device-task hot-path cleanups and dedicated rclrs spin thread#13

perf: device-task hot-path cleanups and dedicated rclrs spin thread#13
gsokoll merged 5 commits into
masterfrom
perf/rclrs-spin-thread

gsokoll commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gsokoll commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commits

What changed

Headline numbers (vs v0.1.0)

Modes covered

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gsokoll commented May 13, 2026 •

edited

Loading