Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15232

Installation failed - 0 hosts available while choosing host for machine

XMLWordPrintable

    • Important
    • No
    • 2
    • Metal Platform 238, Metal Platform 239
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Cluster deployment of 4.14.0-0.nightly-2023-06-20-065807 fails as worker nodes are stuck in INSPECTING state despite being reported as MANAGEABLE
      

      From the logs of machine-controller container in machine-api-controllers pod:

      I0621 06:12:02.779472       1 request.go:682] Waited for 2.095824347s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
      E0621 06:12:02.781540       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
      I0621 06:12:02.783418       1 controller.go:179] kni-qe-4-tj65t-worker-0-h6s8g: reconciling Machine
      2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-h6s8g exists.
      2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-h6s8g does not exist.
      I0621 06:12:02.783439       1 controller.go:372] kni-qe-4-tj65t-worker-0-h6s8g: reconciling machine triggers idempotent create
      2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-h6s8g
      2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-h6s8g'
      2023/06/21 06:12:02 No available BareMetalHost found
      W0621 06:12:02.783735       1 controller.go:374] kni-qe-4-tj65t-worker-0-h6s8g: failed to create machine: requeue in: 30s
      I0621 06:12:02.783748       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s
      I0621 06:12:02.783780       1 controller.go:179] kni-qe-4-tj65t-worker-0-j259x: reconciling Machine
      2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-j259x exists.
      2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-j259x does not exist.
      I0621 06:12:02.783792       1 controller.go:372] kni-qe-4-tj65t-worker-0-j259x: reconciling machine triggers idempotent create
      2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-j259x
      2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-j259x'
      2023/06/21 06:12:02 No available BareMetalHost found
      W0621 06:12:02.783971       1 controller.go:374] kni-qe-4-tj65t-worker-0-j259x: failed to create machine: requeue in: 30s
      I0621 06:12:02.783976       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s
      

      BMH Resources:

      oc get bmh -A
      NAMESPACE               NAME                 STATE                    CONSUMER                  ONLINE   ERROR   AGE
      openshift-machine-api   openshift-master-0   externally provisioned   kni-qe-4-tj65t-master-0   true             175m
      openshift-machine-api   openshift-master-1   externally provisioned   kni-qe-4-tj65t-master-1   true             175m
      openshift-machine-api   openshift-master-2   externally provisioned   kni-qe-4-tj65t-master-2   true             175m
      openshift-machine-api   openshift-worker-0   inspecting                                         true             175m
      openshift-machine-api   openshift-worker-1   inspecting                                         true             175m
      

      From Ironic:

      baremetal node list
      +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
      | UUID                                 | Name                                     | Instance UUID                        | Power State | Provisioning State | Maintenance |
      +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
      | 86f146e3-3e48-4a7a-b0ef-57c42083fc92 | openshift-machine-api~openshift-master-0 | 7eeb9e57-2df2-4710-82d9-d3f99a20348e | power on    | active             | False       |
      | 2380f211-934f-4193-8cb1-d09e7008410c | openshift-machine-api~openshift-master-2 | fd856ced-2912-4800-848c-256c00a1fdb7 | power on    | active             | False       |
      | 9ad70c58-de44-4d56-9304-4bf7c95de6fb | openshift-machine-api~openshift-master-1 | aa1a4c89-4215-44ec-90c7-9c5f3de95ab8 | power on    | active             | False       |
      | bb5ea5f4-016c-4bdd-834d-61d575284bf3 | openshift-machine-api~openshift-worker-0 | None                                 | power off   | manageable         | False       |
      | 3045a07a-09d6-43a0-ab9c-d856b54bad6c | openshift-machine-api~openshift-worker-1 | None                                 | power off   | manageable         | False       |
      +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
      

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-06-20-065807
      

      How reproducible:

      so far once
      

      Steps to Reproduce:

      1. Deploy baremetal dualstack cluster with day1 networking
      

      Actual results:

      Deployment fails as worker nodes are not provisioned
      

      Expected results:

      Deployment succeeds
      

            rhn-engineering-dtantsur Dmitry Tantsur
            yprokule@redhat.com Yurii Prokulevych
            Jad Haj Yahya Jad Haj Yahya
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: