-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.z
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While installing an Agent-based HostedCluster using HyperShift, only 2 out of 4 worker nodes successfully join the cluster. The remaining 2 nodes are visible in oc get nodes but are stuck in “Removing from cluster” state during installation. Description Customer is deploying multiple HostedClusters (agent-based) via ArgoCD using a common build manifest. During installation of one HostedCluster, only 2/4 NodePool replicas successfully complete provisioning and receive intended roles/labels. The other 2 nodes get stuck in “Removing from cluster” state during installation, even though they appear in oc get nodes. This results in: HostedCluster installation incomplete NodePool replica count not achieved Node role labels inconsistent across nodes The build manifest places multiple HostedClusters and Agent inventory resources in the same namespace (clusters) and uses: platform.agent.agentNamespace: clusters for all HostedClusters InfraEnv, BareMetalHost, NMStateConfig also in the same clusters namespace This appears to allow cross-cluster agent adoption / reconciliation conflicts, leading to stuck node removal during provisioning. Customer Impact Hosted cluster installation fails or remains stuck Nodes oscillate/remain stuck in removing state Prevents scaling / installing additional hosted clusters reliably Requires manual remediation / reinstall
Version-Release number of selected component (if applicable):
4.18
How reproducible:
100%
Steps to Reproduce:
1. Create a namespace clusters
2. Deploy multiple HostedCluster resources into clusters
3. Deploy multiple NodePool resources into clusters (replicas=4)
4. Set the following on all HostedClusters:
platform:
agent:
agentNamespace: clusters
5. Create all inventory resources (InfraEnv, BareMetalHost, NMStateConfig) in the same namespace clusters
6. Start HostedCluster installation for one hosted cluster
7. Observe that only 2/4 nodes successfully complete join and labeling; remaining nodes get stuck in “Removing from cluster”
Actual results:
some of the nodes fails to add to the cluster
Expected results:
All node should get added to the cluster
Additional info: