Skip to content

MIGSOFTWAR-41070: dhcp4relay drops packets on first VLANs in large batch#112

Open
cshivashgit wants to merge 1 commit into
sonic-net:masterfrom
cshivashgit:dhcp4relay-fix-source-intf-race-on-batch
Open

MIGSOFTWAR-41070: dhcp4relay drops packets on first VLANs in large batch#112
cshivashgit wants to merge 1 commit into
sonic-net:masterfrom
cshivashgit:dhcp4relay-fix-source-intf-race-on-batch

Conversation

@cshivashgit

Copy link
Copy Markdown
Contributor

When a large batch of DHCPV4_RELAY VLANs is configured alongside their underlying VLAN_INTERFACEs and a Loopback used as source_interface, the first ~10 VLANs silently drop relayed DHCP packets with "No IPv4 address configured".

Root cause: prepare_relay_interface_config() resolves the source interface IP via getifaddrs(). At scale, intfmgrd has not yet programmed the Loopback IP into the kernel by the time the DHCPV4_RELAY config arrives, so getifaddrs() misses it, src_intf_sel_addr is left at 0.0.0.0, and from_client() drops every packet for those VLANs. The authoritative source IP is already known to the dhcp4relay process at that point (as a *INTERFACE pub/sub event has either landed in the SubscriberStateTable buffer or already passed through process_interface_notification with no matching vlan in vlans_copy); only the kernel state is behind.

Fix: maintain an in-process intf_to_addr_cache map populated as a side-effect of process_interface_notification, and dispatch DHCPv4_RELAY_INTERFACE_UPDATE from that cache when a DHCPV4_RELAY batch registers a vlan whose source_interface IP was cached but never replayed. No Redis access on the hot path.

Wired in two places:

  • In the startup predrain block of handle_swss_notification(), after process_relay_notification(initial_entries) and before the DHCPv4_RELAY_SYNC_BARRIER write, drain the *INTERFACE SubscriberStateTables and feed them through process_interface_notification. The buffers were seeded by their construction-time SCAN, so this both populates the cache and dispatches INTERFACE_UPDATE for vlans registered by the relay drain immediately above.

  • In the runtime select loop, immediately after the DHCPV4_RELAY branch's process_relay_notification(entries) call, replay matching cache entries via dispatch_source_intf_from_cache() so a vlan whose source_interface IP arrived earlier (and was discarded by process_interface_notification because vlans_copy didn't yet contain the vlan) still receives a correct src_intf_sel_addr before any packet reaches prepare_relay_interface_config().

INTERFACE_UPDATE is the same event type produced by process_interface_notification, so the main thread handler in dhcp4relay.cpp needs no change. Idempotent: the existing kernel-driven INTERFACE flow simply rewrites the same src_intf_sel_addr if it later delivers another event for the same interface.

When a large batch of DHCPV4_RELAY VLANs is configured alongside their
underlying VLAN_INTERFACEs and a Loopback used as source_interface, the
first ~10 VLANs silently drop relayed DHCP packets with
"No IPv4 address configured".

Root cause: prepare_relay_interface_config() resolves the source
interface IP via getifaddrs(). At scale, intfmgrd has not yet programmed
the Loopback IP into the kernel by the time the DHCPV4_RELAY config
arrives, so getifaddrs() misses it, src_intf_sel_addr is left at
0.0.0.0, and from_client() drops every packet for those VLANs. The
authoritative source IP is already known to the dhcp4relay process at
that point (as a *INTERFACE pub/sub event has either landed in the
SubscriberStateTable buffer or already passed through
process_interface_notification with no matching vlan in vlans_copy);
only the kernel state is behind.

Fix: maintain an in-process intf_to_addr_cache map populated as a
side-effect of process_interface_notification, and dispatch
DHCPv4_RELAY_INTERFACE_UPDATE from that cache when a DHCPV4_RELAY batch
registers a vlan whose source_interface IP was cached but never
replayed. No Redis access on the hot path.

Wired in two places:

  - In the startup predrain block of handle_swss_notification(), after
    process_relay_notification(initial_entries) and before the
    DHCPv4_RELAY_SYNC_BARRIER write, drain the *INTERFACE
    SubscriberStateTables and feed them through
    process_interface_notification. The buffers were seeded by their
    construction-time SCAN, so this both populates the cache and
    dispatches INTERFACE_UPDATE for vlans registered by the relay
    drain immediately above.

  - In the runtime select loop, immediately after the DHCPV4_RELAY
    branch's process_relay_notification(entries) call, replay matching
    cache entries via dispatch_source_intf_from_cache() so a vlan whose
    source_interface IP arrived earlier (and was discarded by
    process_interface_notification because vlans_copy didn't yet
    contain the vlan) still receives a correct src_intf_sel_addr
    before any packet reaches prepare_relay_interface_config().

INTERFACE_UPDATE is the same event type produced by
process_interface_notification, so the main thread handler in
dhcp4relay.cpp needs no change. Idempotent: the existing kernel-driven
INTERFACE flow simply rewrites the same src_intf_sel_addr if it later
delivers another event for the same interface.

Signed-off-by: Shivashankar CR <shivashankar.c.r@gmail.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants