-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16.z
-
Incidents & Support
-
False
-
-
5
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While attempting installation of OpenShift 4.16.36 using RHACM 2.13, provisioning of the nodes completes successfully, however attempts to remove/deprovision nodes fail with ramdisk unable to call back to Ironic. This happens when cleaning is enabled (automatedCleaningMode=metadata) - when it's disabled there is no attempt to boot into ramdisk so deprovisioning completes successfully. During attempts of cluster or node removal, the corresponding BareMetalHost (BMH) enters a deprovisioning state and eventually fails with a cleaning timeout, leaving the host stuck and not removed from inventory. Ironic reports a clean failed state without a detailed error message.
Version-Release number of selected component (if applicable):
RHACM: 2.13.2 OpenShift Container Platform: 4.16.36 Component: metal3-baremetal-operator / Ironic
How reproducible:
Always reproducible on affected nodes when automatedCleaningMode is set to metadata.
Steps to Reproduce:
Deploy a cluster using RHACM 2.13.x with bare-metal nodes managed via Metal³. Configure the BareMetalHost resource with: spec: automatedCleaningMode: metadata Provision the cluster successfully. Attempt to remove the node or deprovision the cluster (for example, by removing the host from infrastructure). Observe the BareMetalHost during deprovisioning.
Actual results:
BareMetalHost remains stuck in deprovisioning state. Ironic transitions the node to clean failed with a timeout: Cleaning ramdisk does not complete successfully. No actionable error details are reported in Ironic or metal3-baremetal-operator logs. Node is not removed from inventory and requires manual intervention.
Expected results:
BareMetalHost should successfully complete cleaning and deprovisioning when automatedCleaningMode: metadata is used. Node should be cleanly removed from infrastructure without manual recovery steps.
Additional info:
Nodes configured with automatedCleaningMode: disabled deprovision correctly in the same environment. Logs consistently show Ironic reporting clean failed due to a cleaning timeout, indicating the cleaning ramdisk is not completing its operation. No customer-specific identifiers, hostnames, IPs, or infrastructure details are required to reproduce this issue.