Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74267

[4.21] Upgrading causes VMs using a ovn-k8s-cni-overlay localnet NAD to lose connectivity due to a missing logical port

    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • 1
    • Critical
    • None
    • Unspecified
    • Production
    • Customer Escalated
    • Done
    • Bug Fix
    • Hide
      * Before this update, upgrading to {product title} 4.18 caused loss of network connectivity for VM pods using `ovn-k8s-cni-overlay` localnet NADs. As a consequence, VM network connectivity was lost during the upgrade, requiring pod or VM restarts. With this release, the upgrade process fix includes logical switch port creation for VMs during the 4.18 upgrade. As a result, VMs maintain network connectivity during and after upgrading to {product-title} 4.18. (link:https://issues.redhat.com/browse/OCPBUGS-74267[OCPBUGS-74267])

      prevent LSP removal during upgrades for loalnet ipamless topologies
      Show
      * Before this update, upgrading to {product title} 4.18 caused loss of network connectivity for VM pods using `ovn-k8s-cni-overlay` localnet NADs. As a consequence, VM network connectivity was lost during the upgrade, requiring pod or VM restarts. With this release, the upgrade process fix includes logical switch port creation for VMs during the 4.18 upgrade. As a result, VMs maintain network connectivity during and after upgrading to {product-title} 4.18. (link: https://issues.redhat.com/browse/OCPBUGS-74267 [ OCPBUGS-74267 ]) prevent LSP removal during upgrades for loalnet ipamless topologies
    • None
    • None
    • None
    • None

      Description of problem:

      While upgrading the cluster to 4.18 from 4.17, after the cluster network operator finishes upgrading to the 4.18 image, VMs begin to lose network connectivity through their ovn-k8s-cni-overlay localnet NADs.

      Restarting the ovnkube-node pod seems to resolve the issue, as does performing a VM live migration for the impacted VM. No OVN DB rebuild was tested as restarting the ovnkube-node pod works.

       

      Running 
      ovn-nbctl list logical-switch-port
      shows that the impacted VM does not have the logical switch port for connectivity.

      This has currently only been brought up for VM pods to my knowledge, I haven't heard it happen with non-VM pods using localnet NADs.

       

      Version-Release number of selected component (if applicable):

      4.18.z

      Currently has been seen on 4.18.27 and 4.18.28 specifically but is likely wider.

       

      How reproducible:

      Currently unsure, has proved difficult so far but I am currently working on a potential reproduction.

       

      Steps to Reproduce:

      1. Create a 4.17.z cluster

      2. Install and configure OpenShift Virt

      3. Configure a ovn-k8s-cni-overlay localnet NAD

      4. Create a VM using the NAD configured in step 3

      6. Upgrade to 4.18.z

       

      Actual results:

      After the cluster network operator upgrades, connectivity is lost to VMs over their localnet NADs.

       

      Expected results:

      During and after upgrade connectivity remains for VMs over their localnet NADs.

      Additional info:

      Details on specific testing will be commented.

       

      Affected Platforms:

      OpenShift Container Platform 4.18

              ellorent Felix Enrique Llorente Pastora
              rhn-support-jclarkmu Jade Clark-Muth
              None
              None
              Arti Sood Arti Sood
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: