Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58452

Upgrade from 4.14.1 to 4.15.0-ec.2 is stuck but not reported as such by CVO

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0.5
    • Moderate
    • No
    • None
    • None
    • OTA 273, OTA 274
    • 2
    • In Progress
    • Bug Fix
    • Hide
      * Before this update, when a Cluster Operator took a long time to upgrade, the Cluster Version Operator (CVO) did not report anything because it could not determine if the upgrade was still progressing or already stuck. With this release, a new unknown status is added for the failing condition in the status of the cluster version reported by the CVO to remind the cluster administrators to check the cluster. As a result, the administrators do not need to wait on a blocked Cluster Operator upgrade. (link:https://issues.redhat.com/browse/OCPBUGS-58452[OCPBUGS-58452])
      Show
      * Before this update, when a Cluster Operator took a long time to upgrade, the Cluster Version Operator (CVO) did not report anything because it could not determine if the upgrade was still progressing or already stuck. With this release, a new unknown status is added for the failing condition in the status of the cluster version reported by the CVO to remind the cluster administrators to check the cluster. As a result, the administrators do not need to wait on a blocked Cluster Operator upgrade. (link: https://issues.redhat.com/browse/OCPBUGS-58452 [ OCPBUGS-58452 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-23514. The following is the description of the original issue:

      Description of problem:

      Upgrade of the ota-stage cluster from 4.14.1 to 4.15.0-ec.2 got stuck because of the operator-lifecycle-manager-packageserver ClusterOperator which never reaches the desired version (likely because its Pods are CrashLooping, which is a separate issue discussed now on Slack and OCPBUGS-23538 was filed for it)

      However, I would expect CVO to enter its waiting for operator-lifecycle-manager-packageserver up to 40 minutes state, eventually hit that deadline and signal the upgrade as stuck via a Failing=True condition, but that did not happen and CVO does not signal anything problematic in this stuck state.

      Version-Release number of selected component (if applicable):

      upgrade from 4.14.1 to 4.15.0-ec.2

      How reproducible:

      Unsure

      Steps to Reproduce:

      1. upgrade from 4.14.1 to 4.15.0-ec.2 and hope you get stuck the way ota-stage did

      Actual results:

      $ OC_ENABLE_CMD_UPGRADE_STATUS=true ./oc adm upgrade status
      An update is in progress for 2h8m20s: Working towards 4.15.0-ec.2: 695 of 863 done (80% complete), waiting on operator-lifecycle-manager-packageserver
      

      Expected results:

      $ oc adm upgrade status
      Failing=True
        Reason: operator-lifecycle-manager-packageserver is stuck (or whatever is the message)
      
      An update is in progress for 2h8m20s: Working towards 4.15.0-ec.2: 695 of 863 done (80% complete), waiting on operator-lifecycle-manager-packageserver
      

      Additional info

      Attached CVO log and the waited-on CO yaml dump

              afri@afri.cz Petr Muller
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Dinesh Kumar S Dinesh Kumar S
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: