Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9410

OpenShift IPI installation stuck with random Dell (iDRAC) servers boot looping

    XMLWordPrintable

Details

    • Important
    • Rejected
    • Unspecified
    • If docs needed, set a value

    Description

      Description of problem:

      When deploying Baremetal 4.10 IPI cluster using 'Dell PowerEdge R740xd servers', some servers are blocking the installation. The problematic servers are identified, but all the servers are the same model.

      Later, if one of these problematic servers is used to deploy a worker node, the node is provisiones without any issue.

      The installer runs infinitely with the on the `DEBUG ironic_node_v1.openshift-master-host[0]: Still creating...` message, usually able to produce a "Creation complete after" message for 2 out of 3 of the masters:

      ```logs
      DEBUG ironic_node_v1.openshift-master-host[0]: Still creating... [13m11s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[1]: Still creating... [13m11s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[2]: Still creating... [13m11s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[0]: Creation complete after 13m16s [id=e7abba2f-fbf8-4151-befe-ebefefd4b46b]
      DEBUG ironic_node_v1.openshift-master-host[2]: Still creating... [13m21s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[1]: Still creating... [13m21s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[1]: Creation complete after 13m21s [id=b435778e-229d-4bc1-bc3f-8aea881a0310]
      DEBUG ironic_node_v1.openshift-master-host[2]: Still creating... [13m31s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[2]: Still creating... [13m41s elapsed]
      DEBUG ironic_node_v1.openshift-master-host[2]: Still creating... [13m51s elapsed]
      ```

      From that point forward we can observe that the third master (numbered "2") is switching between the "cleaning", "clean failed", "managable" and "clean wait" phases (see ironic-bmh-state.log for details). The installer can run for days in this state.

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Unable to deploy the cluster when using some servers

      Expected results:

      Be able to deploy the cluster with all the servers

      Additional info:

      Attachments

        Activity

          People

            tsedovic@redhat.com Tomas Sedovic
            rhn-support-malonso Maria Del Mar Alonso
            Pedro Jose Amoedo Martinez Pedro Jose Amoedo Martinez
            Red Hat Employee
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: