Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58505

HCP Clusters Stuck Installing/Uninstalling due to ACM not Coming Up [release-4.16]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 4.16.z
    • 4.16.z
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • Done
    • Bug Fix
    • Hide
      Before this update, certificate issues with the API endpoint prevented the Advanced Cluster Management (ACM) agent from starting during cluster install, causing installations to hang and uninstallations to get stuck. With this release, certificate issues have been resolved for the API endpoint, preventing the ACM agent from hanging during installation due to certificate issues. (link:https://issues.redhat.com/browse/OCPBUGS-58505[OCPBUGS-58505])
      Show
      Before this update, certificate issues with the API endpoint prevented the Advanced Cluster Management (ACM) agent from starting during cluster install, causing installations to hang and uninstallations to get stuck. With this release, certificate issues have been resolved for the API endpoint, preventing the ACM agent from hanging during installation due to certificate issues. (link: https://issues.redhat.com/browse/OCPBUGS-58505 [ OCPBUGS-58505 ])
    • None
    • None
    • None
    • None

      Description of problem:

      ITN-2025-00159

      Summary of cause known at this point: On install, certificate issues with the API endpoint in the affected versions is preventing ACM agent from coming up for that cluster. This hangs the installation process. At uninstallation time, those ACM agents are responsible for removing finalizers, which causes the clusters to get stuck uninstalling. 

       

      Version-Release number of selected component (if applicable):

      Affected cluster versions: 
      4.15.{52,53,54}
      4.16.{43}
      4.17.{34}

       

      How reproducible:

      Reproduced on cluster installs

      Steps to Reproduce:

      1. Provision an HCP cluster with one of the affected versions listed above.
      2. Login to the management cluster and look for logs containing "BootstrapSecretMissing,HubKubeConfigSecretMissing".
      oc describe klusterlet klusterlet-$id 

      Actual results:

      Cluster will hang while coming up and have the following error logs:

      2025-07-07 19:45:14 +0000 UTC hostedclusters creed-hcp2-test configuration is invalid: NamedCertificates get secret: Invalid value: "cluster-api-cert": Secret "cluster-api-cert" not found
      2025-07-07 19:45:14 +0000 UTC hostedclusters creed-hcp2-test ValidConfiguration condition is false: NamedCertificates get secret: Invalid value: "cluster-api-cert": Secret "cluster-api-cert" not found 

       

      Expected results:

      Cluster installs successfully and ACM comes up

       

      Additional info:

      Check affected clusters script [TODO ADD]

              Unassigned Unassigned
              dalong.openshift Dakota Long
              None
              None
              He Liu He Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: