Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23514

Upgrade from 4.14.1 to 4.15.0-ec.2 is stuck but not reported as such by CVO

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • 3
    • Moderate
    • No
    • None
    • None
    • OTA 267, OTA 269, OTA 268
    • 3
    • Done
    • Bug Fix
    • Hide
      * Previously, when a Cluster Operator takes a long time to upgrade, Cluster Version Operator does not report anything as it cannot determine if the upgrade is still progressing or already stuck. With this release, a new unknown status is added for the failing condition in status of the Cluster Version reported by Cluster Version Operator to remind the cluster administrators to check the cluster and avoid waiting on a blocked Cluster Operator upgrade. (link:https://issues.redhat.com/browse/OCPBUGS-23514[OCPBUGS-23514])
      Show
      * Previously, when a Cluster Operator takes a long time to upgrade, Cluster Version Operator does not report anything as it cannot determine if the upgrade is still progressing or already stuck. With this release, a new unknown status is added for the failing condition in status of the Cluster Version reported by Cluster Version Operator to remind the cluster administrators to check the cluster and avoid waiting on a blocked Cluster Operator upgrade. (link: https://issues.redhat.com/browse/OCPBUGS-23514 [ OCPBUGS-23514 ])
    • None
    • None
    • None
    • None

      Description of problem:

      Upgrade of the ota-stage cluster from 4.14.1 to 4.15.0-ec.2 got stuck because of the operator-lifecycle-manager-packageserver ClusterOperator which never reaches the desired version (likely because its Pods are CrashLooping, which is a separate issue discussed now on Slack and OCPBUGS-23538 was filed for it)

      However, I would expect CVO to enter its waiting for operator-lifecycle-manager-packageserver up to 40 minutes state, eventually hit that deadline and signal the upgrade as stuck via a Failing=True condition, but that did not happen and CVO does not signal anything problematic in this stuck state.

      Version-Release number of selected component (if applicable):

      upgrade from 4.14.1 to 4.15.0-ec.2

      How reproducible:

      Unsure

      Steps to Reproduce:

      1. upgrade from 4.14.1 to 4.15.0-ec.2 and hope you get stuck the way ota-stage did

      Actual results:

      $ OC_ENABLE_CMD_UPGRADE_STATUS=true ./oc adm upgrade status
      An update is in progress for 2h8m20s: Working towards 4.15.0-ec.2: 695 of 863 done (80% complete), waiting on operator-lifecycle-manager-packageserver
      

      Expected results:

      $ oc adm upgrade status
      Failing=True
        Reason: operator-lifecycle-manager-packageserver is stuck (or whatever is the message)
      
      An update is in progress for 2h8m20s: Working towards 4.15.0-ec.2: 695 of 863 done (80% complete), waiting on operator-lifecycle-manager-packageserver
      

      Additional info

      Attached CVO log and the waited-on CO yaml dump

              hongkliu Hongkai Liu
              afri@afri.cz Petr Muller
              None
              None
              Dinesh Kumar S Dinesh Kumar S
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: