-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.11.z
-
None
-
False
-
-
-
2/21: telco reviewed offline - see comment
This bug is a backport clone of [Bugzilla Bug 2089950](https://bugzilla.redhat.com/show_bug.cgi?id=2089950). The following is the description of the original bug:
—
Description of problem: Some upgrades failed during scale testing with messages indicating the console operator is not available. In total 5 out of 2200 clusters failed with this pattern.
These clusters are all configured with the Console operator disabled in order to reduce overall OCP cpu use in the Telecom environment. The following CR is applied:
apiVersion: operator.openshift.io/v1
kind: Console
metadata:
annotations:
include.release.openshift.io/ibm-cloud-managed: "false"
include.release.openshift.io/self-managed-high-availability: "false"
include.release.openshift.io/single-node-developer: "false"
release.openshift.io/create-only: "true"
ran.openshift.io/ztp-deploy-wave: "10"
name: cluster
spec:
logLevel: Normal
managementState: Removed
operatorLogLevel: Normal
From one cluster (sno01175) the ClusterVersion conditions show:
- oc get clusterversion version -o jsonpath='
{.status.conditions}
' | jq
{ "lastTransitionTime": "2022-05-19T01:44:13Z", "message": "Done applying 4.9.26", "status": "True", "type": "Available" }
[
,
{ "lastTransitionTime": "2022-05-24T14:57:50Z", "message": "Cluster operator console is degraded", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Failing" }
,
{ "lastTransitionTime": "2022-05-24T13:49:43Z", "message": "Unable to apply 4.10.13: wait has exceeded 40 minutes for these operators: console", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Progressing" }
,
{ "lastTransitionTime": "2022-05-21T02:07:06Z", "status": "True", "type": "RetrievedUpdates" }
,
{ "lastTransitionTime": "2022-05-24T13:53:05Z", "message": "Payload loaded version=\"4.10.13\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143\"", "reason": "PayloadLoaded", "status": "True", "type": "ReleaseAccepted" }
,
{ "lastTransitionTime": "2022-05-24T13:57:05Z", "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor version (1.22.5+5c84e52) on node sno01175 will not be supported in the next OpenShift minor version upgrade.", "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade", "status": "False", "type": "Upgradeable" }]
Another cluster (sno01959) has very similar conditions with slight variation in the Failing and Progressing messages:
,
,
Version-Release number of selected component (if applicable): 4.9.26 upgrade to 4.10.13
How reproducible: 5 out of 2200
Steps to Reproduce:
1. Disable console with managementState: Removed
2. Starting OCP version 4.9.26
3. Initiate upgrade to 4.10.13 via ClusterVersion CR
Actual results: Cluster upgrade is stuck (no longer progressing) for 5+ hours
Expected results: Cluster upgrade completes
Additional info:
- blocks
-
OCPBUGS-11342 Upgrade fails with message Cluster operator console is not available
- Closed
- is cloned by
-
OCPBUGS-11342 Upgrade fails with message Cluster operator console is not available
- Closed
- links to