-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14
-
None
-
No
-
False
-
Description of problem:
hosted cluster Deployment fail, 1 agent/machine/bmh failed.
Version-Release number of selected component (if applicable):
How reproducible:
TBD
Steps to Reproduce:
1.deploy a hub cluster with hosted cluster of 6 agent nodes, I used this job : https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/job-runner/2811/ 2.ocp-spoke-assisted-operator-deploy failed for timeout waiting for agents to be done. 3. 1 of 6 agents remains down forever:
Actual results:
[kni@ocp-edge77 ~]$ oc get agents -A NAMESPACE NAME CLUSTER APPROVED ROLE STAGE hosted-0 5168eb46-1b7f-41d0-a667-abf19808104b hosted-0 true worker hosted-0 699e5462-1d63-4c42-8597-fb3c246b272b hosted-0 true worker Done hosted-0 7d0af4aa-6d47-41ff-a45d-c18771365d07 hosted-0 true worker Done hosted-0 da6d5bd9-ba0c-44c6-880d-08199f7818f1 hosted-0 true worker Done hosted-0 e3bd914e-a437-4f4e-973e-49156a7385cc hosted-0 true worker Done hosted-0 edd80256-c390-4189-8782-a498f80fc016 hosted-0 true worker Done In the problematic agent description : Last Transition Time: 2024-05-14T23:42:54Z Message: The agent installation stopped Reason: AgentInstallationStopped Status: True Type: RequirementsMet Last Transition Time: 2024-05-14T23:40:48Z Message: The installation has failed: installation command failed Reason: InstallationFailed Status: False Type: Installed one pending machine: [kni@ocp-edge77 ~]$ oc get machines -A NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION clusters-hosted-0 hosted-0-gf8zv hosted-0-jf4df hosted-worker-0-4 agent://edd80256-c390-4189-8782-a498f80fc016 Running 8h 4.14.25 clusters-hosted-0 hosted-0-jwfd4 hosted-0-jf4df hosted-worker-0-3 agent://7d0af4aa-6d47-41ff-a45d-c18771365d07 Running 8h 4.14.25 clusters-hosted-0 hosted-0-kk76m hosted-0-jf4df hosted-worker-0-2 agent://da6d5bd9-ba0c-44c6-880d-08199f7818f1 Running 8h 4.14.25 clusters-hosted-0 hosted-0-kn8tg hosted-0-jf4df Pending 8h 4.14.25 clusters-hosted-0 hosted-0-r8hfh hosted-0-jf4df hosted-worker-0-0 agent://699e5462-1d63-4c42-8597-fb3c246b272b Running 8h 4.14.25 clusters-hosted-0 hosted-0-rjjpb hosted-0-jf4df hosted-worker-0-1 agent://e3bd914e-a437-4f4e-973e-49156a7385cc Running 8h 4.14.25 Err in its description: Last Transition Time: 2024-05-14T23:44:11Z Message: 0 of 2 completed Reason: InstallationFailed Severity: Error Status: False Type: Ready Last Transition Time: 2024-05-14T23:44:11Z Message: 4 of 5 completed Reason: InstallationFailed Severity: Error Status: False Type: InfrastructureReady Last Transition Time: 2024-05-14T23:41:46Z Reason: WaitingForNodeRef Severity: Info Status: False Type: NodeHealthy Last Updated: 2024-05-14T23:41:46Z one bmh is stuck provisioning : [kni@ocp-edge77 ~]$ oc get bmh -n hosted-0 NAME STATE CONSUMER ONLINE ERROR AGE hosted-worker-0-0-bmh provisioned true 9h hosted-worker-0-1-bmh provisioned true 9h hosted-worker-0-2-bmh provisioned true 9h hosted-worker-0-3-bmh provisioned true 9h hosted-worker-0-4-bmh provisioned true 9h hosted-worker-0-5-bmh provisioning true 9h
Expected results:
Additional info: