Bug summary
VNET routes with monitoring=custom_bfd fail to install in the NPU when the primary endpoint is a directly-connected local DPU IP. The route stays inactive in STATE_DB even though the BFD session is Up and the kernel has a valid ARP entry for the DPU on the VLAN interface.
In SmartSwitch HA deployments where hamgrd programs the primary as the local-DPU IP, inbound VIP traffic falls back to the secondary (remote NH via VxLAN tunnel) or fails entirely.
Symptom
$
edis-cli -n 6 hgetall 'VNET_ROUTE_TUNNEL_TABLE|Vnet-default|<prefix>'
'active_endpoints' -> ''
'state' -> 'inactive'
2026 May 21 05:58:53.784109 NOTICE swss#orchagent: :- createNextHopGroup: Next hop 100.117.156.35@ not found in neighorch, skipping.
2026 May 21 05:58:53.784109 WARNING swss#orchagent: :- updateVnetTunnelCustomMonitor: Failed to create primary based custom next hop group. Cannot proceed.
Root cause
NextHopGroupKey for VNET routes is built from VNET_ROUTE_TUNNEL_TABLE.endpoint, which only carries IPs. For a directly-connected local endpoint the NextHopKey ends up with an empty interface alias (<IP>@). NeighOrch::hasNextHop then misses the stored entry <IP>@<intf> (e.g. <IP>@Vlan32), and createNextHopGroup returns false.
The Down path (NeighOrch::updateNextHop from BfdUpdate) already resolves the interface correctly (setNextHopFlag on <IP> seen on port <intf>). The Up / create path just needs to do the same.
- SONiC image:
20251110.21 (202511 branch)
Related
Why vstest didn't catch it
vs tests don't use vlan config.
Why sonic-mgmt didn't catch it
- There is also no assertion on
STATE_DB:VNET_ROUTE_TUNNEL_TABLE.state anywhere in tests/.
1. How did traffic get forwarded from NPU->DPU??
Bug summary
VNET routes with
monitoring=custom_bfdfail to install in the NPU when the primary endpoint is a directly-connected local DPU IP. The route staysinactiveinSTATE_DBeven though the BFD session isUpand the kernel has a valid ARP entry for the DPU on the VLAN interface.In SmartSwitch HA deployments where
hamgrdprograms theprimaryas the local-DPU IP, inbound VIP traffic falls back to the secondary (remote NH via VxLAN tunnel) or fails entirely.Symptom
Root cause
NextHopGroupKeyfor VNET routes is built fromVNET_ROUTE_TUNNEL_TABLE.endpoint, which only carries IPs. For a directly-connected local endpoint theNextHopKeyends up with an empty interface alias (<IP>@).NeighOrch::hasNextHopthen misses the stored entry<IP>@<intf>(e.g.<IP>@Vlan32), andcreateNextHopGroupreturnsfalse.The Down path (
NeighOrch::updateNextHopfromBfdUpdate) already resolves the interface correctly (setNextHopFlag on <IP> seen on port <intf>). The Up / create path just needs to do the same.20251110.21(202511 branch)Related
NEIGH_RESOLVE_TABLEkeys for VLAN interfaces. This issue is the consumer-side gap in vnetorch.Why vstest didn't catch it
vs tests don't use vlan config.
Why sonic-mgmt didn't catch it
STATE_DB:VNET_ROUTE_TUNNEL_TABLE.stateanywhere intests/.1. How did traffic get forwarded from NPU->DPU??