-
Bug
-
Resolution: Done
-
Major
-
None
-
4.10.z
-
Important
-
None
-
5
-
Metal Platform 227, Metal Platform 228, Metal Platform 229, Metal Platform 230, Metal Platform 232, Metal Platform 233
-
6
-
Rejected
-
False
-
-
-
Bug Fix
-
Proposed
-
Customer Escalated
-
-
Description of problem:
OCP cluster installation (SNO) using assisted installer running on ACM hub cluster. Hub cluster is OCP 4.10.33 ACM is 2.5.4 When a cluster fails to install we remove the installation CRs and cluster namespace from the hub cluster (to eventually redeploy). The termination of the namespace hangs indefinitely (14+ hours) with finalizers remaining. To resolve the hang we can remove the finalizers by editing both the secret pointed to by BareMetalHost .spec.bmc.credentialsName and BareMetalHost CR. When these finalizers are removed the namespace termination completes within a few seconds.
Version-Release number of selected component (if applicable):
OCP 4.10.33 ACM 2.5.4
How reproducible:
Always
Steps to Reproduce:
1. Generate installation CRs (AgentClusterInstall, BMH, ClusterDeployment, InfraEnv, NMStateConfig, ...) with an invalid configuration parameter. Two scenarios validated to hit this issue: a. Invalid rootDeviceHint in BareMetalHost CR b. Invalid credentials in the secret referenced by BareMetalHost.spec.bmc.credentialsName 2. Apply installation CRs to hub cluster 3. Wait for cluster installation to fail 4. Remove cluster installation CRs and namespace
Actual results:
Cluster namespace remains in terminating state indefinitely: $ oc get ns cnfocto1 NAME STATUS AGE cnfocto1 Terminating 17h
Expected results:
Cluster namespace (and all installation CRs in it) are successfully removed.
Additional info:
The installation CRs are applied to and removed from the hub cluster using argocd. The CRs have the following waves applied to them which affects the creation order (lowest to highest) and removal order (highest to lowest): Namespace: 0 AgentClusterInstall: 1 ClusterDeployment: 1 NMStateConfig: 1 InfraEnv: 1 BareMetalHost: 1 HostFirmwareSettings: 1 ConfigMap: 1 (extra manifests) ManagedCluster: 2 KlusterletAddonConfig: 2
- is cloned by
-
OCPBUGS-9955 [4.12] BareMetalHost CR fails to delete on cluster cleanup
- Closed
-
OCPBUGS-24187 BareMetalHost CR fails to delete on cluster cleanup
- Closed
- is depended on by
-
OCPBUGS-9955 [4.12] BareMetalHost CR fails to delete on cluster cleanup
- Closed
- links to