Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74262

OVS bridge (br-ex) intermittently binds to incorrect secondary interface on Dual-NIC Infrastructure nodes after reboot (Race Condition)

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • x86_64
    • Production
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We are experiencing an issue in a fresh installation of OpenShift 4.16.32 (UPI, Platform: None) on Infrastructure nodes configured with Dual NICs. The network configuration is intended as follows:

      • Primary Interface (ens5): OCP Cluster Network / Management.
      • Secondary Interface (ens6): Dedicated storage access for CSI Driver deployment.

      After node reboots, we observe a flapping behavior regarding the OVS Bridge (br-ex) configuration. In approximately 50% of the boots, br-ex correctly bridges the primary interface (ens5). However, in the other 50%, br-ex incorrectly enslaves the secondary storage interface (ens6) and assigns the storage IP to the bridge, leaving the primary OCP interface as a standard standalone NIC.

      This appears to be a race condition during the node startup/NetworkManager initialization where OVN-Kubernetes or the OVS configuration scripts select the wrong interface to attach to br-ex.

      Version-Release number of selected component (if applicable):

      4.16 4.18 4.20

      How reproducible:

      Intermittent (approx. 50% chance on reboot).

      •  

      Steps to Reproduce:

      1.Install an OCP 4.16.32 cluster (UPI method).

      2. Configure Infrastructure nodes with two physical network interfaces (e.g., ens5 for OCP, ens6 for Storage).

      3. Reboot the Infrastructure node.

      4. Check the status of the OVS bridge and interfaces using ip a.

      5. If correct, reboot again until the issue appears.

      Actual results:

      Upon a problematic reboot, the OVS bridge br-ex enslaves the secondary interface (ens6) and adopts the Storage Network IP (e.g., 1.100.76.x). The Ingress pods on that node subsequently report the wrong IP address, breaking external traffic flow.

      Expected results: 
      The OVS bridge br-ex should consistently and persistently bind only to the primary interface (ens5) and hold the OCP Network IP (e.g., 100.88.x.x), regardless of reboots. The secondary interface should remain independent.

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an customer issue / SD

      The customer is performing a fresh UPI installation of OCP 4.16.32 on Dual-NIC Infrastructure nodes. The goal is to separate OCP management traffic (primary NIC ens5) from Storage traffic (secondary NIC ens6).

      The Issue: After reboots, there is a race condition where the OVS bridge br-ex arbitrarily binds to the wrong interface.

      Expected: br-ex binds to ens5 (Primary/OCP Network).

      Actual: br-ex often binds to ens6 (Storage Network), causing the OVS bridge to take the Storage IP address.

      This creates a split-brain networking scenario where the Ingress Routers (hosted on these infra nodes) advertise the wrong IP address or become unreachable on the expected management network. The issue toggles intermittently with reboots.

       

              bnemec@redhat.com Benjamin Nemec
              rhn-support-mangarci Manrique García
              None
              None
              Ross Brattain Ross Brattain
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: