Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5518

Upgrade fails with message Cluster operator console is not available


      This bug is a backport clone of [Bugzilla Bug 2089950](https://bugzilla.redhat.com/show_bug.cgi?id=2089950). The following is the description of the original bug:

      Description of problem: Some upgrades failed during scale testing with messages indicating the console operator is not available. In total 5 out of 2200 clusters failed with this pattern.

      These clusters are all configured with the Console operator disabled in order to reduce overall OCP cpu use in the Telecom environment. The following CR is applied:
      apiVersion: operator.openshift.io/v1
      kind: Console
      include.release.openshift.io/ibm-cloud-managed: "false"
      include.release.openshift.io/self-managed-high-availability: "false"
      include.release.openshift.io/single-node-developer: "false"
      release.openshift.io/create-only: "true"
      ran.openshift.io/ztp-deploy-wave: "10"
      name: cluster
      logLevel: Normal
      managementState: Removed
      operatorLogLevel: Normal

      From one cluster (sno01175) the ClusterVersion conditions show:

      1. oc get clusterversion version -o jsonpath=' {.status.conditions}

        ' | jq

        { "lastTransitionTime": "2022-05-19T01:44:13Z", "message": "Done applying 4.9.26", "status": "True", "type": "Available" }


        { "lastTransitionTime": "2022-05-24T14:57:50Z", "message": "Cluster operator console is degraded", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Failing" }


        { "lastTransitionTime": "2022-05-24T13:49:43Z", "message": "Unable to apply 4.10.13: wait has exceeded 40 minutes for these operators: console", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Progressing" }


        { "lastTransitionTime": "2022-05-21T02:07:06Z", "status": "True", "type": "RetrievedUpdates" }


        { "lastTransitionTime": "2022-05-24T13:53:05Z", "message": "Payload loaded version=\"4.10.13\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143\"", "reason": "PayloadLoaded", "status": "True", "type": "ReleaseAccepted" }


        { "lastTransitionTime": "2022-05-24T13:57:05Z", "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor version (1.22.5+5c84e52) on node sno01175 will not be supported in the next OpenShift minor version upgrade.", "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade", "status": "False", "type": "Upgradeable" }


      Another cluster (sno01959) has very similar conditions with slight variation in the Failing and Progressing messages:

      { "lastTransitionTime": "2022-05-24T14:32:42Z", "message": "Cluster operator console is not available", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Failing" }


      { "lastTransitionTime": "2022-05-24T13:52:04Z", "message": "Unable to apply 4.10.13: the cluster operator console has not yet successfully rolled out", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Progressing" }


      Version-Release number of selected component (if applicable): 4.9.26 upgrade to 4.10.13

      How reproducible: 5 out of 2200

      Steps to Reproduce:
      1. Disable console with managementState: Removed
      2. Starting OCP version 4.9.26
      3. Initiate upgrade to 4.10.13 via ClusterVersion CR

      Actual results: Cluster upgrade is stuck (no longer progressing) for 5+ hours

      Expected results: Cluster upgrade completes

      Additional info:

            jhadvig@redhat.com Jakub Hadvig
            openshift-crt-jira-prow OpenShift Prow Bot
            YaDan Pei YaDan Pei
            Jakub Hadvig
            0 Vote for this issue
            6 Start watching this issue