Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-66994

[4.18] Upgrading To 4.18 causes VMs using a ovn-k8s-cni-overlay localnet NAD to lose connectivity due to a missing logical port

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • Unspecified
    • Production
    • None
    • None
    • None
    • None
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While upgrading the cluster to 4.18 from 4.17, after the cluster network operator finishes upgrading to the 4.18 image, VMs begin to lose network connectivity through their ovn-k8s-cni-overlay localnet NADs.

      Restarting the ovnkube-node pod seems to resolve the issue, as does performing a VM live migration for the impacted VM. No OVN DB rebuild was tested as restarting the ovnkube-node pod works.

       

      Running 
      ovn-nbctl list logical-switch-port
      shows that the impacted VM does not have the logical switch port for connectivity.

      This has currently only been brought up for VM pods to my knowledge, I haven't heard it happen with non-VM pods using localnet NADs.

       

      Version-Release number of selected component (if applicable):

      4.18.z

      Currently has been seen on 4.18.27 and 4.18.28 specifically but is likely wider.

       

      How reproducible:

      Currently unsure, has proved difficult so far but I am currently working on a potential reproduction.

       

      Steps to Reproduce:

      1. Create a 4.17.z cluster

      2. Install and configure OpenShift Virt

      3. Configure a ovn-k8s-cni-overlay localnet NAD

      4. Create a VM using the NAD configured in step 3

      6. Upgrade to 4.18.z

       

      Actual results:

      After the cluster network operator upgrades, connectivity is lost to VMs over their localnet NADs.

       

      Expected results:

      During and after upgrade connectivity remains for VMs over their localnet NADs.

      Additional info:

      Details on specific testing will be commented.

       

      Affected Platforms:

      OpenShift Container Platform 4.18

              ellorent Felix Enrique Llorente Pastora
              rhn-support-jclarkmu Jade Clark-Muth
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: