-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
8
-
False
-
None
-
False
-
Known Issue
-
-
-
OTA 262
-
Approved
Description of problem:
applying oc adm upgrade --to-multi-arch causes cvo condition Progressing=True for less than 2 mintes, after which .status.history mentions the transition as "Completed", while the cluster itself still progressing to heterogenous payload for more than 30 minutes in background, mostly unnoticed. during this time, mostly "oc adm upgrade" shows nothing noticeable, only while monitored continuously, there's intermediate messages of Upgradeable=False or Failing=True, but no further Progressing=True condition. during this time, cluster operators are progressing in the background, and some master nodes NotReady,SchedulingDisabled
Version-Release number of selected component (if applicable):
4.13.0-ec.2
How reproducible:
100%
Steps to Reproduce:
1. apply aforementioned command 2. monitor oc adm upgrade, cvo status, cluster operators and nodes
Actual results:
❯ oc adm upgrade --to-multi-arch Requested update to multi cluster architecture Thu 02 Mar 2023 18:10:41 IST oc adm upgrade info: An upgrade is in progress. Working towards 4.13.0-ec.2: 9 of 831 done (1% complete) History: 2023-03-02T16:10:41Z Partial 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef 2023-03-02T09:31:36Z 2023-03-02T09:32:13Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:01192353b3c3e536779cfa0fc910064299df15ce01be0cff7188868588d32321 Conditions: 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T08:59:13Z Failing=False : 2023-03-02T16:10:41Z Progressing=True : Working towards 4.13.0-ec.2: 9 of 831 done (1% complete) Thu 02 Mar 2023 18:11:46 IST oc adm upgrade Cluster version is 4.13.0-ec.2 History: 2023-03-02T16:10:41Z 2023-03-02T16:11:46Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef 2023-03-02T09:31:36Z 2023-03-02T09:32:13Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:01192353b3c3e536779cfa0fc910064299df15ce01be0cff7188868588d32321 Conditions: 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T08:59:13Z Failing=False : 2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2 #progressing false at this point already. no more Progressing=True for the rest of transition!! oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.13.0-ec.2 True True False 7h14m APIServerDeploymentProgressing: deployment/apiserver.openshift-oauth-apiserver: 2/3 pods have been updated to the latest generation... baremetal 4.13.0-ec.2 True False False 7h32m cloud-controller-manager 4.13.0-ec.2 True False False 7h33m cloud-credential 4.13.0-ec.2 True False False 7h32m cluster-autoscaler 4.13.0-ec.2 True False False 7h31m config-operator 4.13.0-ec.2 True False False 7h32m console 4.13.0-ec.2 True False False 7h20m control-plane-machine-set 4.13.0-ec.2 True False False 7h30m csi-snapshot-controller 4.13.0-ec.2 True False False 7h32m dns 4.13.0-ec.2 True True False 7h31m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6.\nHave 2 up-to-date DNS pods, want 6."... etcd 4.13.0-ec.2 True True False 7h30m NodeInstallerProgressing: 3 nodes are at revision 8; 0 nodes have achieved new revision 9 image-registry 4.13.0-ec.2 True True False 7h26m Progressing: The deployment has not completed... ingress 4.13.0-ec.2 True False False 7h26m insights 4.13.0-ec.2 True False False 7h26m kube-apiserver 4.13.0-ec.2 True True False 7h28m NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 7 kube-controller-manager 4.13.0-ec.2 True True False 7h29m NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 8 kube-scheduler 4.13.0-ec.2 True True False 7h29m NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 7 kube-storage-version-migrator 4.13.0-ec.2 True False False 7h32m machine-api 4.13.0-ec.2 True False False 7h28m machine-approver 4.13.0-ec.2 True False False 7h32m machine-config 4.13.0-ec.2 True False False 7h30m marketplace 4.13.0-ec.2 True False False 7h31m monitoring 4.13.0-ec.2 True False False 7h25m network 4.13.0-ec.2 True True False 7h33m DaemonSet "/openshift-sdn/sdn" update is rolling out (4 out of 6 updated)... node-tuning 4.13.0-ec.2 True False False 7h31m openshift-apiserver 4.13.0-ec.2 True True False 7h26m APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation openshift-controller-manager 4.13.0-ec.2 True False False 7h28m openshift-samples 4.13.0-ec.2 True False False 7h26m operator-lifecycle-manager 4.13.0-ec.2 True False False 7h32m operator-lifecycle-manager-catalog 4.13.0-ec.2 True False False 7h32m operator-lifecycle-manager-packageserver 4.13.0-ec.2 True False False 7h26m service-ca 4.13.0-ec.2 True False False 7h32m storage 4.13.0-ec.2 True False False 7h32m Thu 02 Mar 2023 18:17:51 IST Cluster version is 4.13.0-ec.2 Upgradeable=False Reason: PoolUpdating Message: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details Conditions: 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T08:59:13Z Failing=False : 2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2 2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details Thu 02 Mar 2023 18:23:22 IST 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T08:59:13Z Failing=False : 2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2 2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details Thu 02 Mar 2023 18:27:37 IST Conditions: 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T16:27:36Z Failing=True ClusterOperatorDegraded: Cluster operator machine-config is degraded 2023-03-02T16:11:46Z Progressing=False ClusterOperatorDegraded: Error while reconciling 4.13.0-ec.2: the cluster operator machine-config is degraded 2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details Thu 02 Mar 2023 18:32:37 IST oc adm upgrade Cluster version is 4.13.0-ec.2 Conditions: 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T16:32:36Z Failing=False : 2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2 oc get co kube-apiserver 4.13.0-ec.2 True True False 7h48m NodeInstallerProgressing: 1 nodes are at revision 6; 2 nodes are at revision 7 Thu 02 Mar 2023 18:35:52 IST Failing=True: Reason: ClusterOperatorDegraded Message: Cluster operator kube-apiserver is degraded 2023-03-02T16:11:46Z Progressing=False ClusterOperatorDegraded: Error while reconciling 4.13.0-ec.2: the cluster operator kube-apiserver is degraded Thu 02 Mar 2023 18:39:22 IST Cluster version is 4.13.0-ec.2 2023-03-02T09:29:45Z RetrievedUpdates=True : 2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi" 2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2 2023-03-02T16:39:21Z Failing=False : 2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
Expected results:
Progressing=True while cluster is still progressing
Additional info:
part of the time, "oc adm upgrade" shows as if nothing is happening, an upgrade applied during this time, with --to-latest for example, will take unusually long, around 2 hours to complete, with intermediate timed out messages in operators.
Definition of done:
* CVO should report progressing=true for the whole length of --to-multi-arch transition.
- is blocked by
-
OTA-962 Enhancement proposal for cluster version status should report transitions from single arch to multi arch correctly
- Closed
- links to