Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74269

[4.19] Upgrading causes VMs using a ovn-k8s-cni-overlay localnet NAD to lose connectivity due to a missing logical port

    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • 1
    • Critical
    • None
    • Unspecified
    • Production
    • Customer Escalated
    • Proposed
    • Bug Fix
    • Hide
      Before this update, clusters using localnet IPAM-less topologies had the Label Switched Paths (LSPs) removed during the upgrade process. This caused connectivity disruptions and potential data loss for traffic relying on those specific network paths. With this release, the upgrade logic has been adjusted to ensure that LSP configurations are preserved and remain intact throughout the transition, resulting in consistent network stability and path persistence without the risk of accidental removal. (link:https://issues.redhat.com/browse/OCPBUGS-74269[OCPBUGS-74269])
      Show
      Before this update, clusters using localnet IPAM-less topologies had the Label Switched Paths (LSPs) removed during the upgrade process. This caused connectivity disruptions and potential data loss for traffic relying on those specific network paths. With this release, the upgrade logic has been adjusted to ensure that LSP configurations are preserved and remain intact throughout the transition, resulting in consistent network stability and path persistence without the risk of accidental removal. (link: https://issues.redhat.com/browse/OCPBUGS-74269 [ OCPBUGS-74269 ])
    • None
    • None
    • None
    • None

      Description of problem:

      While upgrading the cluster to 4.18 from 4.17, after the cluster network operator finishes upgrading to the 4.18 image, VMs begin to lose network connectivity through their ovn-k8s-cni-overlay localnet NADs.

      Restarting the ovnkube-node pod seems to resolve the issue, as does performing a VM live migration for the impacted VM. No OVN DB rebuild was tested as restarting the ovnkube-node pod works.

       

      Running 
      ovn-nbctl list logical-switch-port
      shows that the impacted VM does not have the logical switch port for connectivity.

      This has currently only been brought up for VM pods to my knowledge, I haven't heard it happen with non-VM pods using localnet NADs.

       

      Version-Release number of selected component (if applicable):

      4.18.z

      Currently has been seen on 4.18.27 and 4.18.28 specifically but is likely wider.

       

      How reproducible:

      Currently unsure, has proved difficult so far but I am currently working on a potential reproduction.

       

      Steps to Reproduce:

      1. Create a 4.17.z cluster

      2. Install and configure OpenShift Virt

      3. Configure a ovn-k8s-cni-overlay localnet NAD

      4. Create a VM using the NAD configured in step 3

      6. Upgrade to 4.18.z

       

      Actual results:

      After the cluster network operator upgrades, connectivity is lost to VMs over their localnet NADs.

       

      Expected results:

      During and after upgrade connectivity remains for VMs over their localnet NADs.

      Additional info:

      Details on specific testing will be commented.

       

      Affected Platforms:

      OpenShift Container Platform 4.18

              ellorent Felix Enrique Llorente Pastora
              rhn-support-jclarkmu Jade Clark-Muth
              None
              None
              Arti Sood Arti Sood
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: