-
Bug
-
Resolution: Obsolete
-
Major
-
None
-
False
-
None
-
False
-
-
After make openstack_cleanup..
same issue found in : https://coreos.slack.com/archives/CQXJFGMK6/p1668728786117169
Issue reproduced so I'm opening a bug , live env : titan131 , trying the W/A now
So not sure if all the logs are still in place since the pods gets deleted , but the error message is the only one found in the ironic-conductor-container on that pod
@oko-devappreciate some help debugging : hybrid titan131 got BMH computes stuck at "deprovisioning" state for a Day or so ..
Happened after a successful sriov_dpdk deployment , then olm,openstack cleanup , then waited to redeploy, but the BMHs got stuck in deprovisioning.
[root@titan131 ansible]# oc get bmh -A
NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE
openshift-machine-api openshift-worker-0 deprovisioning false 10d
openshift-machine-api openshift-worker-1 deprovisioning false 10d
thanks
2 replies
from checking the ironic-conductor log of metal3
oc logs -n openshift-machine-api -c metal3-ironic-conductor metal3-66fd9468b9-ms4qq
there is this error
2022-11-17 13:48:36.340 1 ERROR ironic.drivers.modules.drac.management [req-ce96d756-03d0-4220-88a7-8b565bce5015 ironic-user - - - -] DRAC driver failed to clear the job queue for node 6827dd15-7998-4a60-b2af-b96fc20b9330. Reason: DRAC operation failed. Messages: ["DRAC operation failed. Messages: ['A running job cannot be deleted.'] JID_CLEARALL"].: dracclient.exceptions.DRACOperationFailed: DRAC operation failed. Messages: ["DRAC operation failed. Messages: ['A running job cannot be deleted.'] JID_CLEARALL"]ESC[00m
I deleted the metal3 pod to restart ironic and it seems it has cleared that error and after a while the nodes also switched to available