feat(reachability): gate relay publication with canary quorum#136
Open
mickvandijke wants to merge 6 commits into
Open
feat(reachability): gate relay publication with canary quorum#136mickvandijke wants to merge 6 commits into
mickvandijke wants to merge 6 commits into
Conversation
Add a relay canary request/response protocol so newly acquired MASQUE relay addresses are cold-dialed by independent close-group witnesses before they enter DHT self-record gossip. Keep legacy ADD_ADDRESS relay hints out of DHT records, retain canary-rejected relayers across ordinary acquisition failures, and return typed request timeouts so unreachable relay probes count as failed witness attempts. SemVer: minor
Align the relay canary docs with the randomized non-close witness selection, version the internal canary request/response topic, and add driver-level tests for canary rejection retention across acquisition failures. Add structured rollout fields for witness availability and ineligible responses, and link the transport cleanup follow-up for rejected MASQUE allocations. SemVer: patch
…elayer knowledge Witnesses previously refused to probe a relay address unless they already held a Direct address for the named relayer (relay_canary_addr_matches_relayer_record). Witnesses are chosen as non-close peers while the relayer is drawn from the target's close group, so at scale a random witness almost never knows the relayer: canaries returned RelayerUnknown, quorum fell to InsufficientWitnesses, and relays were never published — leaving NAT'd nodes unreachable. The relayer-knowledge check was only an anti-amplification rail, not part of verification (the cold dial plus identity check needs just the relay address and the target identity). Replace it with a per-source token-bucket rate limiter (one dial per 10s per source, reusing crate::rate_limit::Engine, LRU-bounded), which also subsumes the former per-source in-flight concurrency guard. Throttled sources receive WitnessRateLimited (Ineligible), so they never count as a probe failure and cannot trigger a false relay rejection. SemVer: patch Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canary exclusion set was preserved on RelayAcquisitionOutcome::Failed but cleared on every other outcome. If the only close Direct candidate is an excluded relayer, acquisition fails every round and that relayer is never retried, leaving the node permanently relay-less. Clear the set on AcquisitionFailed too, matching the InsufficientWitnesses policy: exclusions now accumulate only across a contiguous run of Rejected verdicts and reset on any other outcome. Backoff rate-limits retries and a still-unreachable relay is simply re-excluded the next round. SemVer: patch Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Do not count request-level relay canary errors as failed relay probes. In mixed-version networks an older witness can authenticate and ignore /rr/relay-canary-v1, producing a timeout without ever evaluating the relay. Keep explicit canary-capable DialFailed, IdentityExchangeFailed, and IdentityMismatch responses as eligible relay failures. SemVer: patch
fe26eef to
5f4c745
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a canary-gated relay publication flow for MASQUE relay acquisition. A newly acquired relay address is now verified by randomized independent non-close witnesses before it is published into the DHT self-record, keeping the DHT as a peer phonebook while ensuring relay reachability is proven before gossip.
Changes
reachability::canary, an internal versioned request/response protocol (relay-canary-v1) that asks selected witnesses to cold-dial a freshly allocated relay address and verify the authenticated target identity.DhtNetworkManagerusing the generic transport request/response envelope rather than a DHT operation.ADD_ADDRESSrelay hints in the DHT bridge so unverified relay allocations cannot bypass the sequenced self-record path.P2PError::Timeout.DialFailedbefore the handler timeout, and classified canary request timeouts as failed witness attempts rather than ineligible witnesses.Follow-up
SemVer
SemVer: patch.Validation
cargo fmt --all -- --checkgit diff --checkcargo test reachability:: --all-featurescargo clippy --all-targets --all-features -- -D warnings -D clippy::panic -D clippy::unwrap_used -D clippy::expect_usedcargo test --all-featuresTarget
rc-2026.6.2