Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74648

HostedCluster reports Available before control plane components have successfully rolled out

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Bug Fix
    • Hide
      * With this update, the system will use the `ControlPlaneComponent` resource to ensure that the HostedControlPlane and HostedCluster is available only after all control plane components have successfully rolled out. This enhancement provides a more reliable and stable user experience by accurately tracking the rollout status of each control plane component, which reduces the risk of unexpected issues during the initial provisioning of a HostedCluster. (link:https://issues.redhat.com/browse/OCPBUGS-74648[OCPBUGS-74648])

      Cause: The HostedControlPlane controller sets the HostedControlPlaneAvailable condition to True without checking whether all control plane components have successfully rolled out.

      Consequence: Customers may interact with the cluster after HostedClusterAvailable is set to True, but before all control plane components are truly ready, leading to potential failures or unexpected behavior.

      Fix: Added a new controlPlaneComponentsAvailable check that lists all ControlPlaneComponent resources and verifies each has its Available condition set to True before setting HostedControlPlaneAvailable. Once the cluster reaches the available state, subsequent component rollouts (e.g., during upgrades) do not flip the condition back to false.

      Result: The HostedCluster and HostedControlPlane now only report Available=True after all control plane components have successfully completed their initial rollout, preventing premature customer interaction with an incomplete control plane.
      Show
      * With this update, the system will use the `ControlPlaneComponent` resource to ensure that the HostedControlPlane and HostedCluster is available only after all control plane components have successfully rolled out. This enhancement provides a more reliable and stable user experience by accurately tracking the rollout status of each control plane component, which reduces the risk of unexpected issues during the initial provisioning of a HostedCluster. (link: https://issues.redhat.com/browse/OCPBUGS-74648 [ OCPBUGS-74648 ]) Cause: The HostedControlPlane controller sets the HostedControlPlaneAvailable condition to True without checking whether all control plane components have successfully rolled out. Consequence: Customers may interact with the cluster after HostedClusterAvailable is set to True, but before all control plane components are truly ready, leading to potential failures or unexpected behavior. Fix: Added a new controlPlaneComponentsAvailable check that lists all ControlPlaneComponent resources and verifies each has its Available condition set to True before setting HostedControlPlaneAvailable. Once the cluster reaches the available state, subsequent component rollouts (e.g., during upgrades) do not flip the condition back to false. Result: The HostedCluster and HostedControlPlane now only report Available=True after all control plane components have successfully completed their initial rollout, preventing premature customer interaction with an incomplete control plane.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-74336. The following is the description of the original issue:

      Description of problem

      The HostedCluster and HostedControlPlane report the Available condition as true before all control plane components have successfully rolled out. With the ControlPlaneComponent resource now available to represent each control plane component, we can determine if each component has successfully rolled out and should delay setting the HostedControlPlaneAvailable condition (and the mirrored HostedClusterAvailable condition) until the control plane components have initially successfully completed their rollout.

      Version-Release number of selected component (if applicable)

      4.22

      How reproducible

      Always

      Steps to Reproduce

      1. Create a new HostedCluster
      2. Observe the HostedCluster conditions during initial provisioning
      3. Note when the Available condition becomes true
      4. Compare against the actual rollout status of control plane components via ControlPlaneComponent resources

      Actual results

      The HostedCluster and HostedControlPlane may report Available=True before all control plane components have completed their initial rollout.

      Expected results

      The HostedControlPlaneAvailable condition (and the mirrored HostedClusterAvailable condition in the HostedCluster) should only be set to true after all control plane components represented by ControlPlaneComponent resources have successfully rolled out.

      Additional info

      The ControlPlaneComponent resource (part of CPOv2) provides the necessary information to track rollout status of each control plane component. This information should be used to gate the availability conditions.

              sjenning Seth Jennings
              rh-ee-brcox Bryan Cox
              None
              None
              Wen Wang Wen Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: