Skip to content

[ecmp] Deduplicate redundant multicast next hops via per-target ECMP selection#1012

Open
zeeshanlakhani wants to merge 2 commits into
masterfrom
zl/mcast-tx-ecmp-select-one
Open

[ecmp] Deduplicate redundant multicast next hops via per-target ECMP selection#1012
zeeshanlakhani wants to merge 2 commits into
masterfrom
zl/mcast-tx-ecmp-select-one

Conversation

@zeeshanlakhani

@zeeshanlakhani zeeshanlakhani commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

The multicast forwarding loop emitted a copy to every programmed next hop. A next hop is a switch endpoint that fans out the packet to every sled (the underlay) or out the front panel (the external) behind its port groups. Redundant next hops sharing a replication target therefore reach the same destinations and deliver the flow twice, and a receiver cannot tell the duplicates apart, so we handle deduplication at Tx.

This PR adds select_nexthops, which resolves one next hop per replication target in two passes over the forwarding table. The loop composes the two selections into each hop's effective replication because a Both replication hop is narrowed when chosen for one target but not the other, and skipped when chosen for neither. All next hops stay programmed for failover.

Also included:

Tests: added an Rx-path isolation test that drives handle_mcast_rx directly by injecting a raw Geneve-over-IPv6 multicast frame over DLPI, with no guest Tx, so a delivered packet can only have arrived via the underlay receive path.

(Local) builds: we thread in TGT_BASE through the buildomat jobs so artifacts can be written to a user-owned directory, and guard the cleanup chown against re-rooting the tree when run elevated locally.

dtrace: Add per-key mcast aggregations for E2E delivery verification. The delivery script tracked multicast events only in coarse rollups (by event, vni, port, group), which confirm that traffic moved but cannot attribute a specific group's fanout or loss to a port, next hop, or underlay address. End-to-end emulation testing profers this resolution to assert which sleds and switch ports a flow reached.

The delivery script tracked multicast events only in coarse rollups (by
event, vni, port, group), which confirm that traffic moved but cannot
attribute a specific group's fanout or loss to a port, next hop, or
underlay address. End-to-end emulation testing profers this resolution to
assert which sleds and switch ports a flow reached.
…selection

The multicast forwarding loop emitted a copy to every programmed next hop.
A next hop is a switch endpoint that fans out the packet to every sled (the
underlay) or out the front panel (the external) behind its port groups.
Redundant next hops sharing a replication target therefore reach the same
destinations and deliver the flow twice, and a receiver cannot tell the
duplicates apart, so we handle deduplication at Tx.

This PR adds `select_nexthops`, which resolves one next hop per replication
target in two passes over the forwarding table. The loop composes the two
selections into each hop's effective replication because a `Both` replication
hop is narrowed when chosen for one target but not the other, and skipped when
chosen for neither. All next hops stay programmed for failover.

Also included:

Tests: add an Rx-path isolation test that drives handle_mcast_rx directly by
injecting a raw Geneve-over-IPv6 multicast frame over DLPI, with no guest Tx, so
a delivered packet can only have arrived via the underlay receive path.

(Local) builds: we thread in TGT_BASE through the buildomat jobs so artifacts
can be written to a user-owned directory, and guard the cleanup chown against
re-rooting the tree when run elevated locally.
@zeeshanlakhani zeeshanlakhani changed the title Zl/mcast tx ecmp select one [ecmp] Deduplicate redundant multicast next hops via per-target ECMP selection Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant