Description of problem:
CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition.
We see:
$ oc get csv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.14.1 OpenShift Virtualization 4.14.1 kubevirt-hyperconverged-operator.v4.14.0 Replacing kubevirt-hyperconverged-operator.v4.15.0 OpenShift Virtualization 4.15.0 kubevirt-hyperconverged-operator.v4.14.1 Pending
And on the v4.15.0 CSV:
$ oc get csv kubevirt-hyperconverged-operator.v4.15.0 -o yaml .... status: cleanup: {} conditions: - lastTransitionTime: "2023-12-19T01:50:48Z" lastUpdateTime: "2023-12-19T01:50:48Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2023-12-19T01:50:48Z" lastUpdateTime: "2023-12-19T01:50:48Z" message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True" is outdated' phase: Pending reason: OperatorConditionNotUpgradeable lastTransitionTime: "2023-12-19T01:50:48Z" lastUpdateTime: "2023-12-19T01:50:48Z" message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True" is outdated' phase: Pending reason: OperatorConditionNotUpgradeable
and if we check the pending operator condition (v4.14.1) we see:
$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2023-12-16T17:10:17Z" generation: 18 labels: operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: "" name: kubevirt-hyperconverged-operator.v4.14.1 namespace: openshift-cnv ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: kubevirt-hyperconverged-operator.v4.14.1 uid: 7db79d4b-e69e-4af8-9335-6269cf004440 resourceVersion: "4116127" uid: 347306c9-865a-42b8-b2c9-69192b0e350a spec: conditions: - lastTransitionTime: "2023-12-18T18:47:23Z" message: "" reason: Upgradeable status: "True" type: Upgradeable deployments: - hco-operator - hco-webhook - hyperconverged-cluster-cli-download - cluster-network-addons-operator - virt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator serviceAccounts: - hyperconverged-cluster-operator - cluster-network-addons-operator - kubevirt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator - cluster-network-addons-operator - kubevirt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator status: conditions: - lastTransitionTime: "2023-12-18T09:41:06Z" message: "" observedGeneration: 11 reason: Upgradeable status: "True" type: Upgradeable
where metadata.generation (18) is not in sync with status.conditions[*].observedGeneration (11).
Even manually redacting spec.conditions.lastTransitionTime is causing a change in metadata.generation (as expected) but this doesn't trigger any reconciliation on the OLM and so status.conditions[*].observedGeneration remains at 11.
$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml apiVersion: operators.coreos.com/v2 kind: OperatorCondition metadata: creationTimestamp: "2023-12-16T17:10:17Z" generation: 19 labels: operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: "" name: kubevirt-hyperconverged-operator.v4.14.1 namespace: openshift-cnv ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: true kind: ClusterServiceVersion name: kubevirt-hyperconverged-operator.v4.14.1 uid: 7db79d4b-e69e-4af8-9335-6269cf004440 resourceVersion: "4147472" uid: 347306c9-865a-42b8-b2c9-69192b0e350a spec: conditions: - lastTransitionTime: "2023-12-18T18:47:25Z" message: "" reason: Upgradeable status: "True" type: Upgradeable deployments: - hco-operator - hco-webhook - hyperconverged-cluster-cli-download - cluster-network-addons-operator - virt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator serviceAccounts: - hyperconverged-cluster-operator - cluster-network-addons-operator - kubevirt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator - cluster-network-addons-operator - kubevirt-operator - ssp-operator - cdi-operator - hostpath-provisioner-operator - mtq-operator status: conditions: - lastTransitionTime: "2023-12-18T09:41:06Z" message: "" observedGeneration: 11 reason: Upgradeable status: "True" type: Upgradeable
since its observedGeneration is out of sync, this check:
https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operatorconditions.go#L44C1-L48
fails and the upgrade never starts.
I suspect (I'm only guessing) that it could be a regression introduced with the memory optimization for https://issues.redhat.com/browse/OCPBUGS-17157 .
Version-Release number of selected component (if applicable):
OCP 4.15.0-ec.3
How reproducible:
- Not reproducible (with the same CNV bundles) on OCP v4.14.z. - Pretty high (but not 100%) on OCP 4.15.0-ec.3
Steps to Reproduce:
1. Try triggering a CNV v4.14.1 -> v4.15.0 on OCP 4.15.0-ec.3 2. 3.
Actual results:
The OLM is not reacting to changes on spec.conditions on the pending operator condition, so metadata.generation is constantly out of sync with status.conditions[*].observedGeneration and so the CSV is reported as message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True" is outdated' phase: Pending reason: OperatorConditionNotUpgradeable
Expected results:
The OLM correctly reconcile the operatorCondition and the upgrade starts
Additional info:
Not reproducible with exactly the same bundle (origin and target) on OCP v4.14.z
- blocks
-
OCPBUGS-25818 CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition
- Closed
- duplicates
-
OCPBUGS-25672 CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition
- Closed
- is cloned by
-
OCPBUGS-25818 CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition
- Closed
- is related to
-
OCPBUGS-25448 olm-operator pod always restart due to "detected that every object is labelled, exiting to re-start the process..." when upgrading OCP to 4.15 from 4.14.6
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update