Skip to content

fix(proto): ObservedAddr retranmission by creating a path-specific retransmittable data queue#705

Open
divagant-martian wants to merge 9 commits into
mainfrom
per-generation-retransmission
Open

fix(proto): ObservedAddr retranmission by creating a path-specific retransmittable data queue#705
divagant-martian wants to merge 9 commits into
mainfrom
per-generation-retransmission

Conversation

@divagant-martian

@divagant-martian divagant-martian commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

  • Adds PathRetransmits to track data that must be sent over a specific path.
  • Fixes re-transmission of OBSERVED_ADDR frames. This was not done correctly
    because the loss of a packet containing these frames would set a global
    variable to be re-sent, that was then picked up by the first path. Think a
    path "stealing" re-transmission of the frame from another path.
  • Naming standards: Replaces observed_addr_sent for pending_observed_addr to
    use the naming convention in the codebase.

Partially replaces #674

Breaking Changes

Notes & open questions

Much simpler and elegant this way.
As I mentioned in #674 a test is not really possible because of the multiple
"safeguards" that prevents this from happening

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.

@divagant-martian divagant-martian added this to the noq: iroh v1.0 + 1 milestone Jun 12, 2026
@divagant-martian divagant-martian added the QAD QUIC Address Discovery extension label Jun 12, 2026
@divagant-martian divagant-martian self-assigned this Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/705/docs/noq/

Last updated: 2026-06-15T00:24:15Z

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Performance Comparison Report

8ae84250f8c4af440655876abc582381426badd4 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5347.4 Mbps 8304.2 Mbps -35.6% 94.9% / 101.0%
medium-concurrent 5505.2 Mbps 7781.5 Mbps -29.3% 95.1% / 102.0%
medium-single 3589.3 Mbps 4523.4 Mbps -20.7% 93.6% / 101.0%
small-concurrent 3760.0 Mbps 5101.3 Mbps -26.3% 90.3% / 98.2%
small-single 3497.5 Mbps 4709.6 Mbps -25.7% 89.8% / 97.6%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal N/A 3827.6 Mbps N/A
lan N/A 810.4 Mbps N/A
lossy N/A 55.9 Mbps N/A
wan N/A 83.8 Mbps N/A

Summary

noq is 28.7% slower on average

---
2e7278fc8ca412124fc2bbce94c9ea33d4edbfc5 - artifacts

No results available

---
eb653ee8c4cbb2c4adc8b307611408ae49dfe5c0 - artifacts

No results available

---
bc1a9649ea03cb30c3ac4af01d1bce6776886c65 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5717.8 Mbps 7879.4 Mbps -27.4% 97.9% / 99.2%
medium-concurrent 5493.1 Mbps 7954.9 Mbps -30.9% 96.7% / 98.3%
medium-single 4130.8 Mbps 4749.6 Mbps -13.0% 96.2% / 98.6%
small-concurrent 3828.5 Mbps 5306.7 Mbps -27.9% 97.9% / 99.9%
small-single 3550.5 Mbps 4785.0 Mbps -25.8% 95.9% / 98.4%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3107.0 Mbps 4029.4 Mbps -22.9%
lan 782.4 Mbps 810.4 Mbps -3.4%
lossy 69.8 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 25.0% slower on average

@n0bot n0bot Bot added this to iroh Jun 12, 2026
@github-project-automation github-project-automation Bot moved this to 🚑 Needs Triage in iroh Jun 12, 2026
@flub flub moved this from 🚑 Needs Triage to 👀 In review in iroh Jun 12, 2026

@flub flub left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new PathData::pending_observed_address also needs to be checked in Connection::space_can_send. Perhaps adding a PathData::can_send similar to the existing PacketSpace::can_send is a nice pattern to follow.

Overall this is looking good!

Comment thread noq-proto/src/connection/paths.rs Outdated
Comment thread noq-proto/src/connection/paths.rs Outdated
first_packet_after_rtt_sample: None,
in_flight: InFlight::new(),
observed_addr_sent: false,
pending_observed_addr: true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this endpoint is not configured to report observed addrs to it's peer this field will always remain true because we initialise it with this default? I think that could be fine, though would appreciate if the doc comment on the field called this out as it is slightly unusual behaviour compared to the rest of the code.

My main concern is that it means we have to be careful when deciding if something can be sent though. Because the check essentially has to be moved there, which is a bit unusual and maybe a bit more bug prone. But then it may be way more practical to have access to the field over there than here.

How difficult would it be to initialise this to the right value? TransportConfig::address_discovery_role is not enough I guess because you need to know what was negotiated. Would adding another parameter to this function be horrible?

This comment was marked as off-topic.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I made this a whole "pending" and queueing is explicit now. We do have to take more care but this sets the boilerplate for any other path-specific pending. Is this too much? not sure, we can leave it (yay zero-cost abstractions) and go back to a single bool otherwise. The previous logic worked but also it's very outside the norm. The code used to treat this a a must-send-always and negotiation was checked only to send

Comment thread noq-proto/src/connection/spaces.rs Outdated
Comment thread noq-proto/src/connection/spaces.rs
Comment thread noq-proto/src/connection/spaces.rs Outdated
Comment thread noq-proto/src/connection/spaces.rs Outdated
/// Retransmittable data queue
/// Retransmittable data queue.
///
/// Data in this queue must be retransmittable over any path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I again would concentrate on the SpaceKind more than saying something about paths.

@divagant-martian divagant-martian changed the title fix(proto): ObservedAddr retranmission by creating a path-specific retransmitable data queue fix(proto): ObservedAddr retranmission by creating a path-specific retransmittable data queue Jun 14, 2026
@divagant-martian

Copy link
Copy Markdown
Collaborator Author

Tests are failing for a very interesting reason. Before, since we didn't have proactive queueing of observed address, but it was oportunistic instead, OBSERVED_ADDR was always sent with the first Data datagram, triggered by other kind of frame. Now it's sent earlier, even earlier than HandshakeDone (which is correct, allowed by the spec, using "0.5 rtt").

This might be more efficient for small handshakes because the first data packet containing only observed address reports is coalesced instead of the datagram being padded, which in turns frees up space in the next data packet for new connection ids.

For large handshakes tho, it's less efficient, as observed address are ready before new connection ids, so, with OBSERVED_ADDR not fitting inside the first datagram (again, large handshake) it's sent on one datagram on its own (new connection ids are not ready yet) and the new connection ids are sent on a third datagram.

About "fixing" this, whatever that means, there are a couple considerations:

  1. is the new behaviour desirable? certainly it's more "correct" and OBSERVED_ADDR frames are no longer "scaping" scheduling logic, which we had not even noticed happened.
  2. retransmission testing becomes slightly harder, because handshake size (as far as I can tell) is not currently deterministic, fixed size. It would be harder to test as it's not always the same datagram (the first or the second) that needs to be lost. This can be adapted in a couple of ways - fixed cert, maybe check stats to know if the frame has been sent out.
  3. if address discovery is enabled, the Path::ObservedAddr event is always emitted before HandshakeDone due to the nature of 0.5rtt. Tests don't expect this, as they depend on specific events, specific order to declare the connection ready. This, as the previous one, is more an issue of a a hassle on tests rather than an impediment of any kind.
  4. What about bringing "back" the old behaviour? This is noticed in two levels: frame sending/receiving and event emission, so it's worth thinking about the two as separate. For the first, there isn't any kind of decent adaptation that keeps a per-path pending queue and the old broken behaviour. The old behaviour conflated sending the path specific data when the general space data had something available. These are different concerns that don't really make sense making a single thing. Event emission can either be modified to send reports out to the application after handshake done, or adapt the test machinery to deal with it. I don't think either is particularly relevant.

this is the summary of why PR is red

@flub

flub commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator
  1. This is absolutely desirable! I love that we're coalescing this with the handshake packet. So much better than wasting those padding bytes. In fact I also think that the NEW_CONNECTION_IDs should also coalesce and kind of consider it a bug that they don't: Wasteful datagram use by the server in the handshake #66
  2. Yeah, that's awkward. Though I think a fair amount of infra already exists to make it deterministic. Maybe indeed the cert can also be fixed.
  3. Agree the order the tests expect events in can change, as long as we manage to keep them deterministic. It increases the diff but otherwise no big deal.

@flub flub left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM I think, though the tests need fixing I guess :)

path.observed_addr_sent = true;

space.pending.observed_addr = false;
path.pending.observed_address = false;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. HANDSHAKE_DONE and PING and a few other follow the pattern of mem::replace(&mut path.pending.observed_address, false) in the if condition above, as a way of doing this in one line. Might want to follow the same pattern here? Not sure how much it matters, I'm assuming the compiler optimises it all to the same, but it is maybe slightly less error prone. Or maybe it is just a style thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QAD QUIC Address Discovery extension

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

2 participants