-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.15.z
-
None
-
Moderate
-
None
-
Hypershift Sprint 253
-
1
-
False
-
Description of problem:
In ROSA HCP, the hypershift.openshift.io/force-upgrade-to annotation is used for all OCP upgrades (z-stream/y-stream). Although this annotation allows control plane upgrades to start, even when the `ClusterVersionUpgradeable` status condition on the HostedCluster is `False`, it does not guarantee that the upgrade will succeeed.
Version-Release number of selected component (if applicable):
Tested a control plane upgrade from 4.15.5 --> to 4.15.6, but I believe the specific versions are not important.
How reproducible:
100%
Steps to Reproduce:
What I did: 1. Create a ROSA HCP cluster (not the latest version so that there are available upgrades) 2. Initiate a control plane upgrade 3. Remove the cluster's VPC's private subnet's route from `0.0.0.0/0 -> NAT Gateway` to simulate zero worker nodes 4. Observe how the control plane upgrade progresses
What I think will work in general: 1. Create a HyperShift cluster 2. Remove all nodepools/worker nodes from the cluster 3. Attempt to perform a control plane upgrade 4. Observe how the control plane upgrade progresses
Actual results:
The control plane upgrade is unable to progress, clusteroperators that depend on running pods on the worker nodes will be forever in a progressing state:
--- From `oc get clusterversion version -oyaml` --- - lastTransitionTime: "2024-04-17T19:40:10Z" message: |- Multiple errors are preventing progress: * Cluster operator image-registry is updating versions * Cluster operator ingress is degraded * deployment openshift-cluster-samples-operator/cluster-samples-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "cluster-samples-operator-75f4888746" has timed out progressing. * deployment openshift-console-operator/console-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "console-operator-77cfbc674b" has timed out progressing. * deployment openshift-insights/insights-operator is not available MinimumReplicasUnavailable (Deployment does not have minimum availability.) or progressing ProgressDeadlineExceeded (ReplicaSet "insights-operator-55846d47d8" has timed out progressing.) * deployment openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator is not available MinimumReplicasUnavailable (Deployment does not have minimum availability.) or progressing ProgressDeadlineExceeded (ReplicaSet "kube-storage-version-migrator-operator-775b779cd7" has timed out progressing.) * deployment openshift-monitoring/cluster-monitoring-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "cluster-monitoring-operator-b6cc85c5d" has timed out progressing. * deployment openshift-service-ca-operator/service-ca-operator is Progressing=False: ProgressDeadlineExceeded: ReplicaSet "service-ca-operator-c7b4dbb55" has timed out progressing. reason: MultipleErrors status: "True" type: Failing
Expected results:
The control plane upgrade is able to "complete" - open to discussing the definition of "complete"
Additional info:
- is depended on by
-
HOSTEDCP-1517 Control plane upgrades should succeed regardless of data plane state
- In Progress
- is related to
-
RFE-5522 OVN control-plane vs. data-plane skew within a z stream
- Accepted
-
HOSTEDCP-1146 e2e for Control Plane Release image
- To Do