Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1632

[Spike] Enhanced Debuggability for HyperShift Cluster Upgrade Failures

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 7
    • 0
    • Program Call

      Feature Overview (aka. Goal Summary)

      Facilitate debugging of clusters stuck during the upgrade process in HyperShift. This feature aims to provide clear, actionable insights into what conditions or issues are blocking cluster upgrades, thereby significantly reducing the time and effort needed to troubleshoot and resolve these issues.

      Goals (aka. expected user outcomes)

      The primary outcome is for users, particularly system administrators and (cluster service providers in HCP terminology), to quickly identify and resolve upgrade failures in HyperShift clusters. This feature will expand the functionality of the existing status and metrics system to include detailed indicators of upgrade progress and specific blocking conditions.

      Requirements (aka. Acceptance Criteria):

      1. Provide clear status messages indicating the current stage of cluster upgrades.
      2. Highlight specific conditions that are blocking the upgrades, including degraded operators, etc.
      3. Integrate these indicators into the existing monitoring and metrics systems.
      4. Ensure compatibility with self-managed and managed deployments.
      5. Support for all architectures: x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x).
      6. Ensure the feature is secure, reliable, maintainable, and scalable.
      7. Backport to applicable versions as needed.

      Deployment considerations

       

      Deployment considerations List applicable specific needs (N/A = not applicable)
      Self-managed, managed, or both Both
      Classic (standalone cluster) N/A
      Hosted control planes Applicable
      Multi node, Compact (three node), or Single node (SNO), or all N/A
      Connected / Restricted Network Both
      Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) All architectures
      Operator compatibility Must be compatible with relevant operators
      Backport needed (list applicable versions) Specify versions as identified during refinement
      UI need (e.g. OpenShift Console, dynamic plugin, OCM) Integration with OpenShift Console and OCM
      Other (please specify) N/A

      Use Cases:

      • System administrator troubleshooting an upgrade failure.
      • Automated systems monitoring and alerting on upgrade progress and failures.
      • Post-mortem analysis of upgrade issues to prevent future occurrences.

      Questions to Answer:

      • What specific conditions are most commonly blocking cluster upgrades?
      • How can these conditions be automatically detected and reported?
      • What level of detail is needed in the status messages to be most useful?

      Out of Scope

      • Debugging non-upgrade related issues.
      • Integration with third-party monitoring tools beyond the scope of OpenShift Console and OCM.

      Background

      Current debugging processes for clusters stuck in upgrades are manual and time-consuming. This feature aims to streamline and improve the efficiency of the debugging process by providing automated, clear insights into blocking conditions.

      Documentation Considerations

      Detailed documentation on how to interpret new status messages and metrics will be required. This should include troubleshooting guides and examples. Any changes should be reflected in the existing HyperShift and OpenShift documentation.

      Interoperability Considerations

      This feature impacts HyperShift upgrades on ROSA, ARO, and self-managed HCP. Interoperability test scenarios should include these environments to ensure consistent behavior and reliability across the portfolio.

              Unassigned Unassigned
              azaalouk Adel Zaalouk
              Matthew Werner Matthew Werner
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: