-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16
Description of problem:
When a cluster that is been deployed with Gitops/ZTP approach is being trying to deleted, some of the worker nodes stuck in "deprovisioning" or "deleting" state. When I check logs of metal3-baremetal-operator, I see the following message: {"level":"info","ts":1722497241.5526164,"logger":"provisioner.ironic","msg":"deleting host","host":"fi-911~storage-2.fi-911.tre.nsn-rdnet.net","ID":"accb0c10-8a95-4208-9e2d-c38d57ec0667","lastError":"Failed to prepare node accb0c10-8a95-4208-9e2d-c38d57ec0667 for cleaning: Validation of image href https://assisted-image-service-multicluster-engine.apps.fi-915.tre.nsn-rdnet.net/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiIwMzY1ZjlkNy0yNjFiLTRmMDUtOWMwYy0xMzNhZjMxZmU1ZDYifQ.4vZ9nNKf9BEv_xxRb_fqEgykwfa4O9Fs-k1kjyBcRjOF3c-qcbDaGMcEkSoOecMhDIFjoOkwTxSoVAKkpUrc0Q/4.16.0/x86_64/minimal.iso failed, reason: Got HTTP code 401 instead of 200 in response to HEAD request.","current":"clean failed","target":"available","deploy step":{"args":{},"interface":"deploy","priority":80,"step":"start_assisted_install"}} This is also effecting installation of cluster next time. When other nodes are booted up with discovery iso, these old nodes which were failing, this time stuck in "inspecting" state in ACM GUI. The nodes that are failing have "automatedCleaningMode: metadata" in their BMH resource.
Version-Release number of selected component (if applicable):
4.16.0
How reproducible:
Try to delete spoke cluster using Gitops approach and then some of the nodes who has "automatedCleaningMode: metadata" stuck in "deprovisioning state. Also, when you try to deploy the same cluster, these old node who were failing this time stuck in "inspecting" state.
Steps to Reproduce:
1. 2. 3.
Actual results:
During cluster deletion, some BMH resource stuck in "deprovisioning" or "deleting" state
Expected results:
BMH resources should have been removed successfully.
Additional info:
- is related to
-
OCPBUGS-42945 BareMetalHost CR fails to delete on cluster cleanup
- New
-
OCPBUGS-44026 BaremetalHosts stuck in 'provisioning' state despite deploy_step success
- New