-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
100% To Do, 0% In Progress, 0% Done
-
7
-
0
-
Program Call
Feature Overview (aka. Goal Summary)
Facilitate debugging of clusters stuck during the upgrade process in HyperShift. This feature aims to provide clear, actionable insights into what conditions or issues are blocking cluster upgrades, thereby significantly reducing the time and effort needed to troubleshoot and resolve these issues.
Goals (aka. expected user outcomes)
The primary outcome is for users, particularly system administrators and (cluster service providers in HCP terminology), to quickly identify and resolve upgrade failures in HyperShift clusters. This feature will expand the functionality of the existing status and metrics system to include detailed indicators of upgrade progress and specific blocking conditions.
Requirements (aka. Acceptance Criteria):
- Provide clear status messages indicating the current stage of cluster upgrades.
- Highlight specific conditions that are blocking the upgrades, including degraded operators, etc.
- Integrate these indicators into the existing monitoring and metrics systems.
- Ensure compatibility with self-managed and managed deployments.
- Support for all architectures: x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x).
- Ensure the feature is secure, reliable, maintainable, and scalable.
- Backport to applicable versions as needed.
Deployment considerations
Deployment considerations | List applicable specific needs (N/A = not applicable) |
---|---|
Self-managed, managed, or both | Both |
Classic (standalone cluster) | N/A |
Hosted control planes | Applicable |
Multi node, Compact (three node), or Single node (SNO), or all | N/A |
Connected / Restricted Network | Both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All architectures |
Operator compatibility | Must be compatible with relevant operators |
Backport needed (list applicable versions) | Specify versions as identified during refinement |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Integration with OpenShift Console and OCM |
Other (please specify) | N/A |
Use Cases:
- System administrator troubleshooting an upgrade failure.
- Automated systems monitoring and alerting on upgrade progress and failures.
- Post-mortem analysis of upgrade issues to prevent future occurrences.
Questions to Answer:
- What specific conditions are most commonly blocking cluster upgrades?
- How can these conditions be automatically detected and reported?
- What level of detail is needed in the status messages to be most useful?
Out of Scope
- Debugging non-upgrade related issues.
- Integration with third-party monitoring tools beyond the scope of OpenShift Console and OCM.
Background
Current debugging processes for clusters stuck in upgrades are manual and time-consuming. This feature aims to streamline and improve the efficiency of the debugging process by providing automated, clear insights into blocking conditions.
Documentation Considerations
Detailed documentation on how to interpret new status messages and metrics will be required. This should include troubleshooting guides and examples. Any changes should be reflected in the existing HyperShift and OpenShift documentation.
Interoperability Considerations
This feature impacts HyperShift upgrades on ROSA, ARO, and self-managed HCP. Interoperability test scenarios should include these environments to ensure consistent behavior and reliability across the portfolio.
- clones
-
OCPSTRAT-1598 Enhanced Debuggability for HyperShift Cluster Installation Failures
- Backlog
- is depended on by
-
RFE-6162 Facilitate debugging clusters stuck installing
- Accepted
- links to