Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33688

hosted cluster deployment fail for agent installation not done

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.14
    • HyperShift / Agent
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

        hosted cluster Deployment fail, 1 agent/machine/bmh failed.
         

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          TBD

      Steps to Reproduce:

          1.deploy a hub cluster with hosted cluster of 6 agent nodes, I used this job : https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/job-runner/2811/
          2.ocp-spoke-assisted-operator-deploy failed for timeout waiting for agents to be done.
          3. 1 of 6 agents remains down forever:
      
          

      Actual results:

         [kni@ocp-edge77 ~]$ oc get agents -A
      NAMESPACE   NAME                                   CLUSTER    APPROVED   ROLE     STAGE
      hosted-0    5168eb46-1b7f-41d0-a667-abf19808104b   hosted-0   true       worker   
      hosted-0    699e5462-1d63-4c42-8597-fb3c246b272b   hosted-0   true       worker   Done
      hosted-0    7d0af4aa-6d47-41ff-a45d-c18771365d07   hosted-0   true       worker   Done
      hosted-0    da6d5bd9-ba0c-44c6-880d-08199f7818f1   hosted-0   true       worker   Done
      hosted-0    e3bd914e-a437-4f4e-973e-49156a7385cc   hosted-0   true       worker   Done
      hosted-0    edd80256-c390-4189-8782-a498f80fc016   hosted-0   true       worker   Done
       
      In the problematic agent description :
          Last Transition Time:  2024-05-14T23:42:54Z
          Message:               The agent installation stopped
          Reason:                AgentInstallationStopped
          Status:                True
          Type:                  RequirementsMet
          Last Transition Time:  2024-05-14T23:40:48Z
          Message:               The installation has failed: installation command failed
          Reason:                InstallationFailed
          Status:                False
          Type:                  Installed
      
      
      one pending machine:
      [kni@ocp-edge77 ~]$ oc get machines -A
      NAMESPACE           NAME             CLUSTER          NODENAME            PROVIDERID                                     PHASE     AGE   VERSION
      clusters-hosted-0   hosted-0-gf8zv   hosted-0-jf4df   hosted-worker-0-4   agent://edd80256-c390-4189-8782-a498f80fc016   Running   8h    4.14.25
      clusters-hosted-0   hosted-0-jwfd4   hosted-0-jf4df   hosted-worker-0-3   agent://7d0af4aa-6d47-41ff-a45d-c18771365d07   Running   8h    4.14.25
      clusters-hosted-0   hosted-0-kk76m   hosted-0-jf4df   hosted-worker-0-2   agent://da6d5bd9-ba0c-44c6-880d-08199f7818f1   Running   8h    4.14.25
      clusters-hosted-0   hosted-0-kn8tg   hosted-0-jf4df                                                                      Pending   8h    4.14.25
      clusters-hosted-0   hosted-0-r8hfh   hosted-0-jf4df   hosted-worker-0-0   agent://699e5462-1d63-4c42-8597-fb3c246b272b   Running   8h    4.14.25
      clusters-hosted-0   hosted-0-rjjpb   hosted-0-jf4df   hosted-worker-0-1   agent://e3bd914e-a437-4f4e-973e-49156a7385cc   Running   8h    4.14.25
      
      
      Err in its description:
          Last Transition Time:  2024-05-14T23:44:11Z
          Message:               0 of 2 completed
          Reason:                InstallationFailed
          Severity:              Error
          Status:                False
          Type:                  Ready
          Last Transition Time:  2024-05-14T23:44:11Z
          Message:               4 of 5 completed
          Reason:                InstallationFailed
          Severity:              Error
          Status:                False
          Type:                  InfrastructureReady
          Last Transition Time:  2024-05-14T23:41:46Z
          Reason:                WaitingForNodeRef
          Severity:              Info
          Status:                False
          Type:                  NodeHealthy
        Last Updated:            2024-05-14T23:41:46Z
      
      
      one bmh is stuck provisioning :
      [kni@ocp-edge77 ~]$ oc get bmh -n hosted-0
      NAME                    STATE          CONSUMER   ONLINE   ERROR   AGE
      hosted-worker-0-0-bmh   provisioned               true             9h
      hosted-worker-0-1-bmh   provisioned               true             9h
      hosted-worker-0-2-bmh   provisioned               true             9h
      hosted-worker-0-3-bmh   provisioned               true             9h
      hosted-worker-0-4-bmh   provisioned               true             9h
      hosted-worker-0-5-bmh   provisioning              true             9h
      
      
      
      
      
      
      
      

      Expected results:

          

      Additional info:

          

       

              cchun@redhat.com Crystal Chun
              rhn-support-gamado Gal Amado
              Gal Amado Gal Amado
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: