Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-4196

Impact of OCPBUGS-22293 [4.13] CNO fails to apply ovnkube-master daemonset during upgrade

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      We're asking the following questions to evaluate whether or not OCPBUGS-22293 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      clusters that have upgraded from 4.10->4.11 will be vulnerable and will be affected by this issue when/if they
      eventually upgrade to 4.12.41+, 4.13.16+ or 4.14+

      Which types of clusters?

      • clusters using the OVNKubernetes CNI and that have previously been upgraded from 4.10
        to 4.11
      • clusters that were initially installed with 4.11 or newer are not affected.
      • no reports of this issue for clusters using the OpenShiftSDN CNI are known, although specific
        testing has not been done at this point

      What is the impact? Is it serious enough to warrant removing update recommendations?

      upgrading clusters that are susceptible to this issue will be stuck in the network operator rollout and
      the only known solution to allow upgrades to progress is a two step manual process.

      this will occur on all versions beyond those that introduced the changes.

      How involved is remediation?

      resolving this is a two step manual process:

      edit the ovnkube-master daemonset and remove the "lifecycle" section from the ovnkube-master
      container (which only includes the preStop hook)
      'oc rollout restart deployment cluster-version-operator -n openshift-cluster-version'

      this can be done as a pre-upgrade step instead of waiting for the upgrade to be stuck:

      # mark the network operator as unmanaged and remove the preStop hooks for the 'sbdb' and 'nbdb' containers:
      
        oc patch Network.operator.openshift.io cluster --type='merge'  -p='{"spec":{"managementState":"Unmanaged"}}'
      
        oc patch daemonset -nopenshift-ovn-kubernetes ovnkube-master --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/1/lifecycle/preStop"}, {"op": "remove", "path": "/spec/template/spec/containers/3/lifecycle/preStop"}]'
      
      
      # this will cause the network operator to update. wait for it to rollout
      
        oc wait co network --for='condition=PROGRESSING=False' --timeout=600s
      

      initiate the upgrade after this. Upon completion the network operator will be moved back
      to "Managed" automatically as part of the upgrade.

      Is this a regression?

      Yes.

              jluhrsen Jamo Luhrsen
              afri@afri.cz Petr Muller
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: