Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32382

Control Plane Upgrade unable to complete with zero worker nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • 4.15.z
    • HyperShift
    • None
    • Moderate
    • None
    • Hypershift Sprint 253
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In ROSA HCP, the hypershift.openshift.io/force-upgrade-to annotation is used for all OCP upgrades (z-stream/y-stream). Although this annotation allows control plane upgrades to start, even when the `ClusterVersionUpgradeable` status condition on the HostedCluster is `False`, it does not guarantee that the upgrade will succeeed.    

      Version-Release number of selected component (if applicable):

      Tested a control plane upgrade from 4.15.5 --> to 4.15.6, but I believe the specific versions are not important.

      How reproducible:

      100%    

      Steps to Reproduce:

      What I did:
          1. Create a ROSA HCP cluster (not the latest version so that there are available upgrades)
          2. Initiate a control plane upgrade
          3. Remove the cluster's VPC's private subnet's route from `0.0.0.0/0 -> NAT Gateway` to simulate zero worker nodes
          4. Observe how the control plane upgrade progresses
      What I think will work in general:
          1. Create a HyperShift cluster
          2. Remove all nodepools/worker nodes from the cluster
          3. Attempt to perform a control plane upgrade
          4. Observe how the control plane upgrade progresses     

      Actual results:

      The control plane upgrade is unable to progress, clusteroperators that depend on running pods on the worker nodes will be forever in a progressing state:
      ---
      From `oc get clusterversion version -oyaml`
      ---
        - lastTransitionTime: "2024-04-17T19:40:10Z"
          message: |-
            Multiple errors are preventing progress:
            * Cluster operator image-registry is updating versions
            * Cluster operator ingress is degraded
            * deployment openshift-cluster-samples-operator/cluster-samples-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "cluster-samples-operator-75f4888746" has timed out progressing.
            * deployment openshift-console-operator/console-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "console-operator-77cfbc674b" has timed out progressing.
            * deployment openshift-insights/insights-operator is not available MinimumReplicasUnavailable (Deployment does not have minimum availability.) or progressing ProgressDeadlineExceeded (ReplicaSet "insights-operator-55846d47d8" has timed out progressing.)
            * deployment openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator is not available MinimumReplicasUnavailable (Deployment does not have minimum availability.) or progressing ProgressDeadlineExceeded (ReplicaSet "kube-storage-version-migrator-operator-775b779cd7" has timed out progressing.)
            * deployment openshift-monitoring/cluster-monitoring-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "cluster-monitoring-operator-b6cc85c5d" has timed out progressing.
            * deployment openshift-service-ca-operator/service-ca-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "service-ca-operator-c7b4dbb55" has timed out progressing.
          reason: MultipleErrors
          status: "True"
          type: Failing    

      Expected results:

      The control plane upgrade is able to "complete" - open to discussing the definition of "complete"

      Additional info:

          

              agarcial@redhat.com Alberto Garcia Lamela
              mshen.openshift Michael Shen
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: