-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
67% To Do, 33% In Progress, 0% Done
-
7
-
0
-
Program Call
Feature Overview (aka. Goal Summary)
A common concern with dealing with escalations/incidents in Managed OpenShift Hosted Control Planes is the resolution time incurred when the fix needs to be delivered in a component of the solution that ships within the OpenShift release payload. This is because OpenShift's release payloads:
- Have a hotfix process that is customer/support-exception targeted rather than fleet targeted
- Can take weeks to be available for Managed OpenShift
This feature seeks to provide mechanisms that put the upper time boundary in delivering such fixes to match the current HyperShift Operator <24h expectation
Goals (aka. expected user outcomes)
- Hosted Control Plane fixes are delivered through Konflux builds
- No additional upgrade edges
- Release specific
- Adequate, fleet representative, automated testing coverage
- Reduced human interaction
Requirements (aka. Acceptance Criteria):
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
- Overriding Hosted Control Plane components can be done automatically once the PRs are ready and the affected versions have been properly identified
- Managed OpenShift Hosted Clusters have their Control Planes fix applied without requiring customer intervention and without workload disruption beyond what might already be incurred because of the incident it is solving
- Fix can be promoted through integration, stage and production canary with a good degree of observability
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | managed (ROSA and ARO) |
Classic (standalone cluster) | No |
Hosted control planes | Yes |
Multi node, Compact (three node), or Single node (SNO), or all | All supported ROSA/HCP topologies |
Connected / Restricted Network | All supported ROSA/HCP topologies |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All supported ROSA/HCP topologies |
Operator compatibility | CPO and Operators depending on it |
Backport needed (list applicable versions) | TBD |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | No |
Other (please specify) | No |
Use Cases (Optional):
- Incident response when the engineering solution is partially or completely in the Hosted Control Plane side rather than in the HyperShift Operator
Out of Scope
- HyperShift Operator binary bundling
Background
Discussed previously during incident calls. Design discussion document
Customer Considerations
- Because the Managed Control Plane version does not change but it is overridden, customer visibility and impact should be limited as much as possible.
Documentation Considerations
SOP needs to be defined for:
- Requesting and approving the fleet wide fixes described above
- Building and delivering them
- Identifying clusters with deployed fleet wide fixes