-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14
-
Important
-
No
-
2
-
Metal Platform 238, Metal Platform 239
-
2
-
Rejected
-
False
-
-
Description of problem:
Cluster deployment of 4.14.0-0.nightly-2023-06-20-065807 fails as worker nodes are stuck in INSPECTING state despite being reported as MANAGEABLE
From the logs of machine-controller container in machine-api-controllers pod:
I0621 06:12:02.779472 1 request.go:682] Waited for 2.095824347s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s E0621 06:12:02.781540 1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\"" "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"} I0621 06:12:02.783418 1 controller.go:179] kni-qe-4-tj65t-worker-0-h6s8g: reconciling Machine 2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-h6s8g exists. 2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-h6s8g does not exist. I0621 06:12:02.783439 1 controller.go:372] kni-qe-4-tj65t-worker-0-h6s8g: reconciling machine triggers idempotent create 2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-h6s8g 2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-h6s8g' 2023/06/21 06:12:02 No available BareMetalHost found W0621 06:12:02.783735 1 controller.go:374] kni-qe-4-tj65t-worker-0-h6s8g: failed to create machine: requeue in: 30s I0621 06:12:02.783748 1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s I0621 06:12:02.783780 1 controller.go:179] kni-qe-4-tj65t-worker-0-j259x: reconciling Machine 2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-j259x exists. 2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-j259x does not exist. I0621 06:12:02.783792 1 controller.go:372] kni-qe-4-tj65t-worker-0-j259x: reconciling machine triggers idempotent create 2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-j259x 2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-j259x' 2023/06/21 06:12:02 No available BareMetalHost found W0621 06:12:02.783971 1 controller.go:374] kni-qe-4-tj65t-worker-0-j259x: failed to create machine: requeue in: 30s I0621 06:12:02.783976 1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s
BMH Resources:
oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE openshift-machine-api openshift-master-0 externally provisioned kni-qe-4-tj65t-master-0 true 175m openshift-machine-api openshift-master-1 externally provisioned kni-qe-4-tj65t-master-1 true 175m openshift-machine-api openshift-master-2 externally provisioned kni-qe-4-tj65t-master-2 true 175m openshift-machine-api openshift-worker-0 inspecting true 175m openshift-machine-api openshift-worker-1 inspecting true 175m
From Ironic:
baremetal node list +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+ | 86f146e3-3e48-4a7a-b0ef-57c42083fc92 | openshift-machine-api~openshift-master-0 | 7eeb9e57-2df2-4710-82d9-d3f99a20348e | power on | active | False | | 2380f211-934f-4193-8cb1-d09e7008410c | openshift-machine-api~openshift-master-2 | fd856ced-2912-4800-848c-256c00a1fdb7 | power on | active | False | | 9ad70c58-de44-4d56-9304-4bf7c95de6fb | openshift-machine-api~openshift-master-1 | aa1a4c89-4215-44ec-90c7-9c5f3de95ab8 | power on | active | False | | bb5ea5f4-016c-4bdd-834d-61d575284bf3 | openshift-machine-api~openshift-worker-0 | None | power off | manageable | False | | 3045a07a-09d6-43a0-ab9c-d856b54bad6c | openshift-machine-api~openshift-worker-1 | None | power off | manageable | False | +--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-06-20-065807
How reproducible:
so far once
Steps to Reproduce:
1. Deploy baremetal dualstack cluster with day1 networking
Actual results:
Deployment fails as worker nodes are not provisioned
Expected results:
Deployment succeeds