-
Spike
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
None
-
False
-
---
-
-
-
0
-
0
We're asking the following questions to evaluate whether or not OCPBUGS-22293 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.
Which 4.y.z to 4.y'.z' updates increase vulnerability?
clusters that have upgraded from 4.10->4.11 will be vulnerable and will be affected by this issue when/if they
eventually upgrade to 4.12.41+, 4.13.16+ or 4.14+
Which types of clusters?
- clusters using the OVNKubernetes CNI and that have previously been upgraded from 4.10
to 4.11 - clusters that were initially installed with 4.11 or newer are not affected.
- no reports of this issue for clusters using the OpenShiftSDN CNI are known, although specific
testing has not been done at this point
What is the impact? Is it serious enough to warrant removing update recommendations?
upgrading clusters that are susceptible to this issue will be stuck in the network operator rollout and
the only known solution to allow upgrades to progress is a two step manual process.
this will occur on all versions beyond those that introduced the changes.
How involved is remediation?
resolving this is a two step manual process:
edit the ovnkube-master daemonset and remove the "lifecycle" section from the ovnkube-master
container (which only includes the preStop hook)
'oc rollout restart deployment cluster-version-operator -n openshift-cluster-version'
this can be done as a pre-upgrade step instead of waiting for the upgrade to be stuck:
# mark the network operator as unmanaged and remove the preStop hooks for the 'sbdb' and 'nbdb' containers: oc patch Network.operator.openshift.io cluster --type='merge' -p='{"spec":{"managementState":"Unmanaged"}}' oc patch daemonset -nopenshift-ovn-kubernetes ovnkube-master --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/1/lifecycle/preStop"}, {"op": "remove", "path": "/spec/template/spec/containers/3/lifecycle/preStop"}]' # this will cause the network operator to update. wait for it to rollout oc wait co network --for='condition=PROGRESSING=False' --timeout=600s
initiate the upgrade after this. Upon completion the network operator will be moved back
to "Managed" automatically as part of the upgrade.
Is this a regression?
Yes.
- is related to
-
OCPBUGS-22293 [4.13] CNO fails to apply ovnkube-master daemonset during upgrade
- Closed
- links to