Summary
From a focused security audit of src/relay_server.rs. The relay's DoS posture rests on a single global rate bucket plus a per-slot byte quota; several routes have no governor at all, and there is no cap on the number of objects a caller can create. Grouped because they share a root (no per-IP keying + unbounded allocation) and are best fixed together.
H2 — global rate-limit bucket (shared-fate DoS)
relay_server.rs :384 — the governor uses GlobalKeyExtractor: one shared bucket for /v1/slot/allocate, /v1/pair, /v1/pair/:id/bootstrap, /v1/pair/abandon. One client hammering allocate consumes the global 10/s + 50 burst, so every other agent's pair/allocate calls 429. The rate-limiter becomes the DoS vector.
Note: the code comment (:376) flags per-IP as a deliberate v0.2 deferral ("Cloudflare WAF + global cap in series"). Real for self-hosted / local relays with no edge WAF.
Fix: per-IP key_extractor via ConnectInfo (SmartIpKeyExtractor).
H1 — unbounded allocation (RAM + disk exhaustion)
allocate_slot (:782, unauthenticated), pair_open (:999), invite_register (:1375), handle_claim (:1156). No cap on the count of slots/handles/pairs/invites per caller; the in-RAM HashMaps grow without bound, and tokens.json is rewritten whole on every allocation (O(n) write amplification).
Fix: per-IP cap + a hard global ceiling on object counts; batched/append token persistence.
M3 — /v1/handles deep-clones the whole handle map, ungoverned
~:1553 — inner.handles.values().cloned().collect() (each record carries a full card) then sorts, on every request, with no governor (route at ~:434). Quadratic amplifier with H1.
Fix: put read endpoints under the (per-IP) governor; paginate from a sorted index.
M4 — /stats.history re-reads an unbounded file, ungoverned
~:622 — read_to_string + per-line JSON parse of stats-history.jsonl (grows ~720 KB/day, prune is a deferred "future wave"), ungoverned (route ~:430).
Fix: implement the prune; cap read size; rate-limit.
Not bugs (verified)
Related
Summary
From a focused security audit of
src/relay_server.rs. The relay's DoS posture rests on a single global rate bucket plus a per-slot byte quota; several routes have no governor at all, and there is no cap on the number of objects a caller can create. Grouped because they share a root (no per-IP keying + unbounded allocation) and are best fixed together.H2 — global rate-limit bucket (shared-fate DoS)
relay_server.rs:384— the governor usesGlobalKeyExtractor: one shared bucket for/v1/slot/allocate,/v1/pair,/v1/pair/:id/bootstrap,/v1/pair/abandon. One client hammering allocate consumes the global 10/s + 50 burst, so every other agent's pair/allocate calls 429. The rate-limiter becomes the DoS vector.Note: the code comment (
:376) flags per-IP as a deliberate v0.2 deferral ("Cloudflare WAF + global cap in series"). Real for self-hosted / local relays with no edge WAF.Fix: per-IP
key_extractorviaConnectInfo(SmartIpKeyExtractor).H1 — unbounded allocation (RAM + disk exhaustion)
allocate_slot(:782, unauthenticated),pair_open(:999),invite_register(:1375),handle_claim(:1156). No cap on the count of slots/handles/pairs/invites per caller; the in-RAM HashMaps grow without bound, andtokens.jsonis rewritten whole on every allocation (O(n) write amplification).Fix: per-IP cap + a hard global ceiling on object counts; batched/append token persistence.
M3 —
/v1/handlesdeep-clones the whole handle map, ungoverned~
:1553—inner.handles.values().cloned().collect()(each record carries a full card) then sorts, on every request, with no governor (route at ~:434). Quadratic amplifier with H1.Fix: put read endpoints under the (per-IP) governor; paginate from a sorted index.
M4 —
/stats.historyre-reads an unbounded file, ungoverned~
:622—read_to_string+ per-line JSON parse ofstats-history.jsonl(grows ~720 KB/day, prune is a deferred "future wave"), ungoverned (route ~:430).Fix: implement the prune; cap read size; rate-limit.
Not bugs (verified)
slot_tokenis never returned by.well-known//v1/handles(onlyslot_id).unwrap()/expect()on request-derived data → no crash-DoS.slot_id/nickpath params validated before filesystem use (after fix(relay): validate claim relay_url + responder-health slot_id; fix false rate-limit comment (security audit) #290).Related