-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
100% To Do, 0% In Progress, 0% Done
-
6
-
0
-
Program Call
Feature Overview (aka. Goal Summary)
Facilitate debugging of clusters stuck during the installation process in HyperShift. This feature aims to provide clear, actionable insights into what conditions or issues are blocking cluster installation, thereby significantly reducing the time and effort needed to troubleshoot and resolve these issues.
Goals (aka. expected user outcomes)
The primary outcome is for users, particularly system administrators and (cluster service providers in HCP terminology), to quickly identify and resolve installation blockages in HyperShift clusters. This feature will expand the functionality of the existing status and metrics system to include detailed indicators of installation progress and specific blocking conditions.
Requirements (aka. Acceptance Criteria):
- Provide clear status messages indicating the current stage of cluster installation.
- Highlight specific conditions that are blocking the installation, including DNS reachability, certificate generation, and resource quotas.
- Integrate these indicators into the existing monitoring and metrics systems.
- Ensure compatibility with self-managed and managed deployments.
- Support for all architectures: x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x).
- Ensure the feature is secure, reliable, maintainable, and scalable.
- Backport to applicable versions as needed.
Deployment considerations
Deployment considerations | List applicable specific needs (N/A = not applicable) |
---|---|
Self-managed, managed, or both | Both |
Classic (standalone cluster) | N/A |
Hosted control planes | Applicable |
Multi node, Compact (three node), or Single node (SNO), or all | N/A |
Connected / Restricted Network | Both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All architectures |
Operator compatibility | Must be compatible with relevant operators |
Backport needed (list applicable versions) | Specify versions as identified during refinement |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Integration with OpenShift Console and OCM |
Other (please specify) | N/A |
Use Cases:
- System administrator troubleshooting an installation failure.
- Automated systems monitoring and alerting on installation progress and blockages.
- Post-mortem analysis of installation issues to prevent future occurrences.
Questions to Answer:
- What specific conditions are most commonly blocking cluster installations?
- How can these conditions be automatically detected and reported?
- What level of detail is needed in the status messages to be most useful?
Out of Scope
- Debugging non-installation related issues.
- Integration with third-party monitoring tools beyond the scope of OpenShift Console and OCM.
Background
Current debugging processes for clusters stuck in installation are manual and time-consuming. This feature aims to streamline and improve the efficiency of the debugging process by providing automated, clear insights into blocking conditions.
Documentation Considerations
Detailed documentation on how to interpret new status messages and metrics will be required. This should include troubleshooting guides and examples. Any changes should be reflected in the existing HyperShift and OpenShift documentation.
Interoperability Considerations
This feature impacts HyperShift installations on ROSA, ARO, and self-managed HCP. Interoperability test scenarios should include these environments to ensure consistent behavior and reliability across the portfolio.
- is cloned by
-
OCPSTRAT-1615 Enhanced Debuggability for HyperShift Cluster NodePool Failures
- New
-
OCPSTRAT-1632 [Spike] Enhanced Debuggability for HyperShift Cluster Upgrade Failures
- New
- is depended on by
-
RFE-6162 Facilitate debugging clusters stuck installing
- Accepted
- links to