Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-70104

BMH provisioning completes but deprovisioning fails when cleaning is enabled (automatedCleaningMode=metadata)

XMLWordPrintable

    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • 5
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      While attempting installation of OpenShift 4.16.36 using RHACM 2.13, provisioning of the nodes completes successfully, however attempts to remove/deprovision nodes fail with ramdisk unable to call back to Ironic. This happens when cleaning is enabled (automatedCleaningMode=metadata) - when it's disabled there is no attempt to boot into ramdisk so deprovisioning completes successfully.
      
      During attempts of cluster or node removal, the corresponding BareMetalHost (BMH) enters a deprovisioning state and eventually fails with a cleaning timeout, leaving the host stuck and not removed from inventory. Ironic reports a clean failed state without a detailed error message.
      
      

      Version-Release number of selected component (if applicable):

      
      RHACM: 2.13.2
      
      OpenShift Container Platform: 4.16.36
      
      Component: metal3-baremetal-operator / Ironic
      
      

      How reproducible:

      
      Always reproducible on affected nodes when automatedCleaningMode is set to metadata.
      
      

      Steps to Reproduce:

      
      Deploy a cluster using RHACM 2.13.x with bare-metal nodes managed via Metal³.
      
      Configure the BareMetalHost resource with:
      
      spec:
        automatedCleaningMode: metadata
      
      
      Provision the cluster successfully.
      
      Attempt to remove the node or deprovision the cluster (for example, by removing the host from infrastructure).
      
      Observe the BareMetalHost during deprovisioning.
      
      

      Actual results:

      
      BareMetalHost remains stuck in deprovisioning state.
      
      Ironic transitions the node to clean failed with a timeout:
      
      Cleaning ramdisk does not complete successfully.
      
      No actionable error details are reported in Ironic or metal3-baremetal-operator logs.
      
      Node is not removed from inventory and requires manual intervention.
      
      

      Expected results:

      
      BareMetalHost should successfully complete cleaning and deprovisioning when automatedCleaningMode: metadata is used.
      
      Node should be cleanly removed from infrastructure without manual recovery steps.
      
      

      Additional info:

      
      Nodes configured with automatedCleaningMode: disabled deprovision correctly in the same environment.
      
      Logs consistently show Ironic reporting clean failed due to a cleaning timeout, indicating the cleaning ramdisk is not completing its operation.
      
      No customer-specific identifiers, hostnames, IPs, or infrastructure details are required to reproduce this issue.
      
      

              janders@redhat.com Jacob Anders
              rhn-support-mlele Mihir Lele
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: