Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11342

Upgrade fails with message Cluster operator console is not available

    XMLWordPrintable

Details

    • No
    • False
    • Hide

      None

      Show
      None
    • 2/21: telco reviewed offline - see comment

    Description

      This is a clone of issue OCPBUGS-5518. The following is the description of the original issue:

      This bug is a backport clone of [Bugzilla Bug 2089950](https://bugzilla.redhat.com/show_bug.cgi?id=2089950). The following is the description of the original bug:

      Description of problem: Some upgrades failed during scale testing with messages indicating the console operator is not available. In total 5 out of 2200 clusters failed with this pattern.

      These clusters are all configured with the Console operator disabled in order to reduce overall OCP cpu use in the Telecom environment. The following CR is applied:
      apiVersion: operator.openshift.io/v1
      kind: Console
      metadata:
      annotations:
      include.release.openshift.io/ibm-cloud-managed: "false"
      include.release.openshift.io/self-managed-high-availability: "false"
      include.release.openshift.io/single-node-developer: "false"
      release.openshift.io/create-only: "true"
      ran.openshift.io/ztp-deploy-wave: "10"
      name: cluster
      spec:
      logLevel: Normal
      managementState: Removed
      operatorLogLevel: Normal

      From one cluster (sno01175) the ClusterVersion conditions show:

      1. oc get clusterversion version -o jsonpath=' {.status.conditions}

        ' | jq
        [

        { "lastTransitionTime": "2022-05-19T01:44:13Z", "message": "Done applying 4.9.26", "status": "True", "type": "Available" }

        ,

        { "lastTransitionTime": "2022-05-24T14:57:50Z", "message": "Cluster operator console is degraded", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Failing" }

        ,

        { "lastTransitionTime": "2022-05-24T13:49:43Z", "message": "Unable to apply 4.10.13: wait has exceeded 40 minutes for these operators: console", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Progressing" }

        ,

        { "lastTransitionTime": "2022-05-21T02:07:06Z", "status": "True", "type": "RetrievedUpdates" }

        ,

        { "lastTransitionTime": "2022-05-24T13:53:05Z", "message": "Payload loaded version=\"4.10.13\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143\"", "reason": "PayloadLoaded", "status": "True", "type": "ReleaseAccepted" }

        ,

        { "lastTransitionTime": "2022-05-24T13:57:05Z", "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor version (1.22.5+5c84e52) on node sno01175 will not be supported in the next OpenShift minor version upgrade.", "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade", "status": "False", "type": "Upgradeable" }

        ]

      Another cluster (sno01959) has very similar conditions with slight variation in the Failing and Progressing messages:

      { "lastTransitionTime": "2022-05-24T14:32:42Z", "message": "Cluster operator console is not available", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Failing" }

      ,

      { "lastTransitionTime": "2022-05-24T13:52:04Z", "message": "Unable to apply 4.10.13: the cluster operator console has not yet successfully rolled out", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Progressing" }

      ,

      Version-Release number of selected component (if applicable): 4.9.26 upgrade to 4.10.13

      How reproducible: 5 out of 2200

      Steps to Reproduce:
      1. Disable console with managementState: Removed
      2. Starting OCP version 4.9.26
      3. Initiate upgrade to 4.10.13 via ClusterVersion CR

      Actual results: Cluster upgrade is stuck (no longer progressing) for 5+ hours

      Expected results: Cluster upgrade completes

      Additional info:

      Attachments

        Issue Links

          Activity

            People

              jhadvig@redhat.com Jakub Hadvig
              openshift-crt-jira-prow OpenShift Prow Bot
              YaDan Pei YaDan Pei
              Jakub Hadvig, YaDan Pei
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: