-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
True
-
Waiting on feedback from @James Harrington on the proposed ADR
-
False
-
OCPSTRAT-1751 - Streamline and decouple Control-plane and NodePool Upgrades Management/Reporting
-
-
-
Hypershift Sprint 253, Hypershift Sprint 254, Hypershift Sprint 255, Hypershift Sprint 256, Hypershift Sprint 257, Hypershift Sprint 258, Hypershift Sprint 259, Hypershift Sprint 262
-
0
-
0
-
0
User Story:
As an ARO/ROSA HCP SRE, I want to be able to manage (initiate and ensure the completion of) control plane upgrades on behalf of customers. Currently, we cannot ensure the completion of control plane upgrades because the upgrades will pause indefinitely if ClusterOperators are not healthy on the customers worker nodes, especially the console, image-registry, and monitoring ClusterOperators. However, there may be more edge-cases that we are not yet aware of from an SRE perspective.
Furthermore, as a managed service provider of HyperShift, I want to be able to ensure that the containers running on management clusters do not have CVEs. This means that I need to have a mechanism for updating HCP containers running on management clusters without depending on a healthy data plane.
One possible way of validating this feature is whether a HyperShift control plane upgrade can complete if there are 0 worker nodes.
(optional) Out of Scope:
- Ensuring the completion of a control plane upgrade if the control plane itself is degraded (kube-apiserver, etcd, etc.)
Engineering Details:
- SD-ADR-0212: Fully Manage HCP Control Plane z-stream Versions
- OTA-540 may be possible to leverage
- We would like this to be an "OCP-supported feature" that's backed by regular validation of HyperShift control plane upgrades in CI/QE. For example, this may be accomplished with the hypershift.openshift.io/force-upgrade-to annotation, but if that becomes the recommended solution, every control plane upgrade will leverage this path.
- We would like regular CI tests for control plane upgrades with zero worker nodes
- depends on
-
OCPBUGS-32382 Control Plane Upgrade unable to complete with zero worker nodes
- Closed
- is blocked by
-
OCPBUGS-38132 OIDC IDP validation check should not be fatal to CPO reconcilation
- Verified
- is related to
-
OTA-540 Do not halt in progress updates on Degraded operators
- New
-
RFE-5522 OVN control-plane vs. data-plane skew within a z stream
- Accepted
-
HOSTEDCP-1146 e2e for Control Plane Release image
- To Do