Skip to content

[bgp] Fix native OVN BGP CI workarounds#4004

Open
eduolivares wants to merge 2 commits into
openstack-k8s-operators:mainfrom
eduolivares:bgp-workarounds-osprh-30900-30905
Open

[bgp] Fix native OVN BGP CI workarounds#4004
eduolivares wants to merge 2 commits into
openstack-k8s-operators:mainfrom
eduolivares:bgp-workarounds-osprh-30900-30905

Conversation

@eduolivares

@eduolivares eduolivares commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Remove the provider network gateway IP (192.168.133.1) from the
    router loopback in prepare-bgp-spines-leaves.yaml. This IP was
    needed with ovn-bgp-agent, but with native OVN BGP the ping reply
    is dropped by an OVN anti-loop flow in lr_in_ip_input. BGP routing
    does not depend on this loopback entry.
  • Add a play to restart neutron pods after BGP compute configuration in
    prepare-bgp-computes.yaml. After a fresh deployment the BGP
    reconciler's full_sync() can be skipped if the OVSDB lock is not
    yet held, leaving arp_proxy unset on interconnect LSPs. A pod
    restart triggers a new full_sync(). This workaround should be
    removed once the bug is fixed in neutron.

Related-Issue: #OSPRH-30905
Related-Issue: #OSPRH-30900

Test plan

  • Deploy a BGP environment with native OVN BGP
  • Verify that the spines/leaves playbook completes without the
    loopback gateway IP task
  • Verify that neutron pods are restarted and reach Ready state
  • Verify that OpenStackControlPlane reconciles successfully
  • Confirm arp_proxy is set on interconnect LSPs after the restart

Assisted-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

eduolivares and others added 2 commits June 17, 2026 15:57
The gateway IP (e.g. `192.168.133.1`) was configured on the router
loopback interface so that VMs could ping the external subnet gateway.
This worked with `ovn-bgp-agent`, but with native OVN BGP — which
replaces `ovn-bgp-agent` in RHOSO — pinging the gateway IP fails:
OVN's `arp_proxy` responds to the ARP request and the ICMP reaches
the router, but an anti-loop flow in `lr_in_ip_input` drops the reply
because the source IP matches the router port address. Since BGP
routing does not depend on this loopback entry, remove it.

Related-Issue: #OSPRH-30905

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Olivares <eolivare@redhat.com>
After a fresh deployment, the BGP reconciler's `full_sync()` can be
skipped if the OVSDB lock is not yet held at startup, and it is never
retried. This leaves `arp_proxy` unset on interconnect LSPs. Restarting
the neutron pods triggers a new `full_sync()` that completes the setup.

This workaround should be removed once the bug is fixed in neutron.

Related-Issue: #OSPRH-30900

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Eduardo Olivares <eolivare@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign eduolivares for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@centosinfra-prod-github-app

Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/b308403f7c2c407d850e7436d0a57577

✔️ openstack-k8s-operators-content-provider SUCCESS in 22m 34s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 5m 28s
cifmw-crc-podified-edpm-baremetal NODE_FAILURE Node(set) request 099-0000122295 failed in 0s
cifmw-crc-podified-edpm-baremetal-minor-update NODE_FAILURE Node(set) request 099-0000122296 failed in 0s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 54s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 59s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant