Skip to content

fix: register unknown channels on the fly for hole-punched connections#58

Closed
jacderida wants to merge 1 commit into
WithAutonomi:mainfrom
jacderida:feat-nat_traversal
Closed

fix: register unknown channels on the fly for hole-punched connections#58
jacderida wants to merge 1 commit into
WithAutonomi:mainfrom
jacderida:feat-nat_traversal

Conversation

@jacderida

@jacderida jacderida commented Mar 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Hole-punched connections are accepted at the transport layer and registered in P2pEndpoint::connected_peers, but the event chain to populate TransportHandle::peers may not have completed when the first DHT response is attempted. Instead of returning PeerNotFound, register the channel immediately so the send can proceed.

Also updates saorsa-transport dependency to the feat-nat_traversal_attempts branch which contains the NAT traversal wiring fixes (WithAutonomi/saorsa-transport#25).

Test plan

  • Hole-punched connections can send DHT responses without PeerNotFound errors
  • Tested on 6-node testnet with NATed local node

🤖 Generated with Claude Code

Greptile Summary

This PR fixes a PeerNotFound error that occurred when a hole-punched connection attempted its first DHT response before the ConnectionEvent::Established handler had finished populating TransportHandle::peers. The fix registers a minimal PeerInfo entry on the fly inside send_on_channel, unblocking the send while relying on the subsequent event to backfill the full metadata (addresses, protocols).

  • src/transport_handle.rs: Instead of returning PeerNotFound, send_on_channel now inserts a skeleton PeerInfo (empty addresses, no protocols) when the channel is missing from self.peers. The send then proceeds through the existing is_connection_active guard, which requires the channel to already be in active_connections — an implicit assumption that holds because active_connections is populated first in the same event handler.
  • Cargo.toml / Cargo.lock: Bumps saorsa-transport from the 0.28 crates.io release to the feat-nat_traversal_attempts branch on a personal fork (jacderida/saorsa-transport). This should be updated to point to the saorsa-labs org repository or a versioned release before this lands in main.
  • The on-the-fly insert uses a read-then-write pattern (two separate lock acquisitions) rather than the atomic entry().or_insert_with() idiom, leaving a small TOCTOU window where a racing event handler could insert a fuller PeerInfo that gets overwritten with the empty skeleton.

Confidence Score: 4/5

  • Safe to merge after the personal-fork dependency is updated to the org repo or a versioned release.
  • The core logic is sound and empirically validated on a 6-node testnet. The fix correctly targets the narrow window where active_connections is populated but peers is not. The two concerns — the TOCTOU read/write pattern and the personal-fork Cargo dependency — are both fixable before merge without changing the approach.
  • Cargo.toml — dependency on a personal fork branch should be redirected to the org repo or a crates.io release before merging.

Important Files Changed

Filename Overview
src/transport_handle.rs Adds on-the-fly PeerInfo registration in send_on_channel to handle hole-punched connections whose event chain hasn't fully populated self.peers yet; TOCTOU race between the read-check and write-insert could overwrite a richer PeerInfo entry.
Cargo.toml Switches saorsa-transport from the crates.io release to a git branch on a personal fork (jacderida); should be updated to the org repo or a versioned release before merging.
Cargo.lock Lock file updated to reflect the new git-sourced saorsa-transport dependency; no issues.

Sequence Diagram

sequenceDiagram
    participant T as saorsa-transport
    participant EH as ConnectionEvent Handler
    participant AC as active_connections
    participant P as peers (TransportHandle)
    participant SC as send_on_channel

    T->>EH: ConnectionEvent::Established (hole-punch)
    EH->>AC: write().insert(channel_id)
    Note over EH,P: await point — another task can run here
    EH->>P: write().insert(PeerInfo { addresses, protocols, ... })

    Note over SC: DHT response arrives before EH reaches peers.insert
    SC->>P: read().contains_key(channel_id) → false
    SC->>P: write().insert(PeerInfo { addresses: [], protocols: [] })
    SC->>AC: is_connection_active? → true (already inserted above)
    SC->>T: send_to_peer_optimized(addr, data) ✓
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/transport_handle.rs
Line: 756-772

Comment:
**TOCTOU race: read-check then write is non-atomic**

The read lock is dropped before the write lock is acquired, leaving a window where the background event handler could complete and insert a fully-populated `PeerInfo` (with `addresses`, `protocols`, etc.). If that happens, the subsequent `write().await.insert(...)` will silently overwrite the richer entry with an empty one, losing the address and protocol metadata.

The standard Rust pattern to avoid this is to take a single write-lock and use `entry().or_insert_with(...)`:

```suggestion
        let mut peers = self.peers.write().await;
        peers.entry(channel_id.to_string()).or_insert_with(|| {
            info!(
                "send_on_channel: registering new channel {} on the fly",
                channel_id
            );
            PeerInfo {
                channel_id: channel_id.to_string(),
                addresses: vec![],
                status: ConnectionStatus::Connected,
                last_seen: Instant::now(),
                connected_at: Instant::now(),
                protocols: Vec::new(),
                heartbeat_count: 0,
            }
        });
        drop(peers);
```

This ensures the insert only happens when the key is genuinely absent and avoids the double-lock window.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: Cargo.toml
Line: 60

Comment:
**Dependency points to a personal fork, not the org repository**

The dependency targets `jacderida/saorsa-transport` (a personal fork) rather than `saorsa-labs/saorsa-transport`. Pinning a production dependency to a personal fork branch creates supply-chain risk (the branch can be force-pushed or deleted) and makes reproducible builds harder for other contributors.

The PR description mentions that PR saorsa-labs/saorsa-transport#25 contains the companion fixes. This `Cargo.toml` change should either:
- point to the `saorsa-labs` org fork once that PR is merged, or
- be updated to a versioned crates.io release (`"0.29"` or similar) before this PR lands.

```suggestion
saorsa-transport = { git = "https://github.com/saorsa-labs/saorsa-transport.git", branch = "feat-nat_traversal_attempts" }
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: register unknown channels on the fl..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Copilot AI review requested due to automatic review settings March 24, 2026 00:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent PeerNotFound when sending over newly hole-punched connections by auto-registering unknown transport channels in TransportHandle::peers during the first send, and updates the saorsa-transport dependency to a NAT-traversal-related branch.

Changes:

  • Register missing channel_id entries in TransportHandle::peers inside send_on_channel instead of returning PeerNotFound.
  • Update saorsa-transport dependency from crates.io to a git branch (feat-nat_traversal_attempts).
  • Update Cargo.lock to reflect the new git-sourced transport dependency.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/transport_handle.rs Adds on-the-fly channel registration in send_on_channel to avoid early PeerNotFound during hole-punch flows.
Cargo.toml Switches saorsa-transport dependency to a git branch for NAT traversal wiring fixes.
Cargo.lock Locks saorsa-transport to a specific git commit from the configured branch.

Comment thread src/transport_handle.rs Outdated
Comment thread src/transport_handle.rs Outdated
Comment thread src/transport_handle.rs
Comment thread Cargo.toml
Comment thread src/transport_handle.rs Outdated
Comment thread Cargo.toml
@jacderida jacderida force-pushed the feat-nat_traversal branch from febbe27 to 4216376 Compare March 24, 2026 17:11
Hole-punched connections may not have their channels registered in
TransportHandle::peers when the first DHT response is attempted.
Instead of returning PeerNotFound, register the channel immediately
so the send can proceed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 24, 2026 17:15
@jacderida jacderida force-pushed the feat-nat_traversal branch from 4216376 to 04980ec Compare March 24, 2026 17:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Comment thread Cargo.toml

# Networking
saorsa-transport = "0.28"
saorsa-transport = { git = "https://github.com/jacderida/saorsa-transport.git", branch = "feat-nat_traversal_attempts" }

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saorsa-transport is pulled from a personal fork (jacderida/saorsa-transport) and a moving branch, which increases supply-chain risk and makes downstream builds non-reproducible (downstream users won’t use this repo’s Cargo.lock). Please switch to the saorsa-labs/saorsa-transport repo and pin an immutable rev (or a crates.io release/tag) before merging, especially if saorsa-core is intended to be publishable on crates.io (git deps generally block publishing).

Suggested change
saorsa-transport = { git = "https://github.com/jacderida/saorsa-transport.git", branch = "feat-nat_traversal_attempts" }
saorsa-transport = "0.27"

Copilot uses AI. Check for mistakes.
@mickvandijke

Copy link
Copy Markdown
Collaborator

Work is part of #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants