fix(proto): ObservedAddr retranmission by creating a path-specific retransmittable data queue#705
fix(proto): ObservedAddr retranmission by creating a path-specific retransmittable data queue#705divagant-martian wants to merge 9 commits into
Conversation
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/705/docs/noq/ Last updated: 2026-06-15T00:24:15Z |
Performance Comparison Report
|
| Scenario | noq | upstream | Delta | CPU (avg/max) |
|---|---|---|---|---|
| large-single | 5347.4 Mbps | 8304.2 Mbps | -35.6% | 94.9% / 101.0% |
| medium-concurrent | 5505.2 Mbps | 7781.5 Mbps | -29.3% | 95.1% / 102.0% |
| medium-single | 3589.3 Mbps | 4523.4 Mbps | -20.7% | 93.6% / 101.0% |
| small-concurrent | 3760.0 Mbps | 5101.3 Mbps | -26.3% | 90.3% / 98.2% |
| small-single | 3497.5 Mbps | 4709.6 Mbps | -25.7% | 89.8% / 97.6% |
Netsim Benchmarks (network simulation)
| Condition | noq | upstream | Delta |
|---|---|---|---|
| ideal | N/A | 3827.6 Mbps | N/A |
| lan | N/A | 810.4 Mbps | N/A |
| lossy | N/A | 55.9 Mbps | N/A |
| wan | N/A | 83.8 Mbps | N/A |
Summary
noq is 28.7% slower on average
2e7278fc8ca412124fc2bbce94c9ea33d4edbfc5 - artifacts
No results available
eb653ee8c4cbb2c4adc8b307611408ae49dfe5c0 - artifacts
No results available
bc1a9649ea03cb30c3ac4af01d1bce6776886c65 - artifacts
Raw Benchmarks (localhost)
| Scenario | noq | upstream | Delta | CPU (avg/max) |
|---|---|---|---|---|
| large-single | 5717.8 Mbps | 7879.4 Mbps | -27.4% | 97.9% / 99.2% |
| medium-concurrent | 5493.1 Mbps | 7954.9 Mbps | -30.9% | 96.7% / 98.3% |
| medium-single | 4130.8 Mbps | 4749.6 Mbps | -13.0% | 96.2% / 98.6% |
| small-concurrent | 3828.5 Mbps | 5306.7 Mbps | -27.9% | 97.9% / 99.9% |
| small-single | 3550.5 Mbps | 4785.0 Mbps | -25.8% | 95.9% / 98.4% |
Netsim Benchmarks (network simulation)
| Condition | noq | upstream | Delta |
|---|---|---|---|
| ideal | 3107.0 Mbps | 4029.4 Mbps | -22.9% |
| lan | 782.4 Mbps | 810.4 Mbps | -3.4% |
| lossy | 69.8 Mbps | 69.8 Mbps | ~0% |
| wan | 83.8 Mbps | 83.8 Mbps | ~0% |
Summary
noq is 25.0% slower on average
flub
left a comment
There was a problem hiding this comment.
I think the new PathData::pending_observed_address also needs to be checked in Connection::space_can_send. Perhaps adding a PathData::can_send similar to the existing PacketSpace::can_send is a nice pattern to follow.
Overall this is looking good!
| first_packet_after_rtt_sample: None, | ||
| in_flight: InFlight::new(), | ||
| observed_addr_sent: false, | ||
| pending_observed_addr: true, |
There was a problem hiding this comment.
So if this endpoint is not configured to report observed addrs to it's peer this field will always remain true because we initialise it with this default? I think that could be fine, though would appreciate if the doc comment on the field called this out as it is slightly unusual behaviour compared to the rest of the code.
My main concern is that it means we have to be careful when deciding if something can be sent though. Because the check essentially has to be moved there, which is a bit unusual and maybe a bit more bug prone. But then it may be way more practical to have access to the field over there than here.
How difficult would it be to initialise this to the right value? TransportConfig::address_discovery_role is not enough I guess because you need to know what was negotiated. Would adding another parameter to this function be horrible?
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Alright, I made this a whole "pending" and queueing is explicit now. We do have to take more care but this sets the boilerplate for any other path-specific pending. Is this too much? not sure, we can leave it (yay zero-cost abstractions) and go back to a single bool otherwise. The previous logic worked but also it's very outside the norm. The code used to treat this a a must-send-always and negotiation was checked only to send
| /// Retransmittable data queue | ||
| /// Retransmittable data queue. | ||
| /// | ||
| /// Data in this queue must be retransmittable over any path. |
There was a problem hiding this comment.
I again would concentrate on the SpaceKind more than saying something about paths.
|
Tests are failing for a very interesting reason. Before, since we didn't have proactive queueing of observed address, but it was oportunistic instead, OBSERVED_ADDR was always sent with the first Data datagram, triggered by other kind of frame. Now it's sent earlier, even earlier than HandshakeDone (which is correct, allowed by the spec, using "0.5 rtt"). This might be more efficient for small handshakes because the first data packet containing only observed address reports is coalesced instead of the datagram being padded, which in turns frees up space in the next data packet for new connection ids. For large handshakes tho, it's less efficient, as observed address are ready before new connection ids, so, with OBSERVED_ADDR not fitting inside the first datagram (again, large handshake) it's sent on one datagram on its own (new connection ids are not ready yet) and the new connection ids are sent on a third datagram. About "fixing" this, whatever that means, there are a couple considerations:
this is the summary of why PR is red |
|
flub
left a comment
There was a problem hiding this comment.
Basically LGTM I think, though the tests need fixing I guess :)
| path.observed_addr_sent = true; | ||
|
|
||
| space.pending.observed_addr = false; | ||
| path.pending.observed_address = false; |
There was a problem hiding this comment.
e.g. HANDSHAKE_DONE and PING and a few other follow the pattern of mem::replace(&mut path.pending.observed_address, false) in the if condition above, as a way of doing this in one line. Might want to follow the same pattern here? Not sure how much it matters, I'm assuming the compiler optimises it all to the same, but it is maybe slightly less error prone. Or maybe it is just a style thing.
Description
PathRetransmitsto track data that must be sent over a specific path.OBSERVED_ADDRframes. This was not done correctlybecause the loss of a packet containing these frames would set a global
variable to be re-sent, that was then picked up by the first path. Think a
path "stealing" re-transmission of the frame from another path.
observed_addr_sentforpending_observed_addrtouse the naming convention in the codebase.
Partially replaces #674
Breaking Changes
Notes & open questions
Much simpler and elegant this way.
As I mentioned in #674 a test is not really possible because of the multiple
"safeguards" that prevents this from happening
Change checklist