Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33500

[OVN-IPSEC] During upgrade from 4.13 to 4.14 ovn-ipsec will stay in error state until all OVN stack is migrate

XMLWordPrintable

    • 05/21; testing 4.13->4.14 upgrade; cu states "production upgrade would harm the availability of production" Likely effect any upgrade w/ipsec to ovn-ic. consider known issue in RN; subsequent upgrade should be ok. Score elevated by px properties

      Description of problem:

      When upgrading OpenShift from 4.13 to 4.14, the CNO migrates ovn-ipsec to its new architecture in 4.14 and then just waits until everything else is migrated. This can take a while in a small cluster like the test I did, but mid to large clusters this may mean that ovn-ipsec will stay unavailable with initContainers in crashloopbackoff [1]. This can be concerning to customers and they expect that such disruption to be as little as possible, even though they understand that some is normal.
      
      Looking at the daemonsets the ovn-ipsec pods will need ovnkube-node to configure the node properly in order for the initContainer to create the key pairs and then have ovn-ipsec started. In normal upgrades between errata releases, I understand why we upgrade ipsec first, but in this case I don't think CNO making this upgrade first is not the best procedure.
      
      Important thing I noticed in the upgrade frpom 4.13.38 to 4.14.20, is that there was an intermediate rollout on ovnkube-node and ovnkube-master daemonsets before the actual migration started. During all this time the new ovn-ipsec was already deployed and pods all crashing.
      
      [1] + echo '2024-05-09T11:22:03+00:00 - ERROR - /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem not found'
      2024-05-09T11:22:03+00:00 - ERROR - /etc/ovn/ovnkube-node-certs/ovnkube-client-current.pem not found
      + return 1
      /bin/bash: line 16: return: can only `return' from a function or sourced script
      
          

      Version-Release number of selected component (if applicable):

      OCP 4.14
          

      How reproducible:

      Often
          

      Steps to Reproduce:

          1. Install OCP on 4.13
          2. Upgrade to 4.14
          3. Monitor the OVN pods
          

      Actual results:

      ovn-ipsec pods will stay in CrashLoopBackOff until the rest of the entire OVN stack is migrated and running
          

      Expected results:

      Minimum disruption problem during the upgrade
          

              rravaiol@redhat.com Riccardo Ravaioli
              rhn-support-andcosta Andre Costa
              Anurag Saxena Anurag Saxena
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: