[ecmp] Deduplicate redundant multicast next hops via per-target ECMP selection#1012
Open
zeeshanlakhani wants to merge 2 commits into
Open
[ecmp] Deduplicate redundant multicast next hops via per-target ECMP selection#1012zeeshanlakhani wants to merge 2 commits into
zeeshanlakhani wants to merge 2 commits into
Conversation
The delivery script tracked multicast events only in coarse rollups (by event, vni, port, group), which confirm that traffic moved but cannot attribute a specific group's fanout or loss to a port, next hop, or underlay address. End-to-end emulation testing profers this resolution to assert which sleds and switch ports a flow reached.
…selection The multicast forwarding loop emitted a copy to every programmed next hop. A next hop is a switch endpoint that fans out the packet to every sled (the underlay) or out the front panel (the external) behind its port groups. Redundant next hops sharing a replication target therefore reach the same destinations and deliver the flow twice, and a receiver cannot tell the duplicates apart, so we handle deduplication at Tx. This PR adds `select_nexthops`, which resolves one next hop per replication target in two passes over the forwarding table. The loop composes the two selections into each hop's effective replication because a `Both` replication hop is narrowed when chosen for one target but not the other, and skipped when chosen for neither. All next hops stay programmed for failover. Also included: Tests: add an Rx-path isolation test that drives handle_mcast_rx directly by injecting a raw Geneve-over-IPv6 multicast frame over DLPI, with no guest Tx, so a delivered packet can only have arrived via the underlay receive path. (Local) builds: we thread in TGT_BASE through the buildomat jobs so artifacts can be written to a user-owned directory, and guard the cleanup chown against re-rooting the tree when run elevated locally.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The multicast forwarding loop emitted a copy to every programmed next hop. A next hop is a switch endpoint that fans out the packet to every sled (the underlay) or out the front panel (the external) behind its port groups. Redundant next hops sharing a replication target therefore reach the same destinations and deliver the flow twice, and a receiver cannot tell the duplicates apart, so we handle deduplication at Tx.
This PR adds
select_nexthops, which resolves one next hop per replication target in two passes over the forwarding table. The loop composes the two selections into each hop's effective replication because aBothreplication hop is narrowed when chosen for one target but not the other, and skipped when chosen for neither. All next hops stay programmed for failover.Also included:
Tests: added an Rx-path isolation test that drives
handle_mcast_rxdirectly by injecting a raw Geneve-over-IPv6 multicast frame over DLPI, with no guest Tx, so a delivered packet can only have arrived via the underlay receive path.(Local) builds: we thread in TGT_BASE through the buildomat jobs so artifacts can be written to a user-owned directory, and guard the cleanup chown against re-rooting the tree when run elevated locally.
dtrace: Add per-key mcast aggregations for E2E delivery verification. The delivery script tracked multicast events only in coarse rollups (by event, vni, port, group), which confirm that traffic moved but cannot attribute a specific group's fanout or loss to a port, next hop, or underlay address. End-to-end emulation testing profers this resolution to assert which sleds and switch ports a flow reached.