Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-67294

ACM 2.15.0: Worker node scale-in fails: BareMetalHost stuck in deleting due to Ironic cleaning error: missing uefi_esp.img

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • Metal Platform 281
    • 1
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description
      Attempting to scale in a worker node using the ACM 2.15 documented procedure results in the BareMetalHost (BMH) entering the deleting state and failing to progress. Ironic reports a provisioning/cleaning error:

      Failed to prepare node <UUID> for cleaning:
      Validation of image href file:///templates/uefi_esp.img failed,
      reason: Specified image file not found.
      

      The node is successfully deleted from the spoke cluster (verified via oc get nodes), but the corresponding BMH on the hub remains in deleting with provisioning error for an extended period before eventually removing only after several failed retries.

      This indicates the Ironic cleaning operation expects an EFI template file (uefi_esp.img) that is missing from the container or environment.

      Environment / Version

      • ACM: 2.15.0
      • Platform: Bare metal (Dell servers — hostnames redacted)

      Steps to Reproduce

      Follow the ACM documentation for worker node scale-in:
      https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.15/html/multicluster_engine_operator_with_red_hat_advanced_cluster_management/ibio-intro#scale-in-worker-nodes

      Mark a worker node for scale-in.

      Observe that the node is removed from the spoke cluster.

      Check the BareMetalHost object on the hub cluster — BMH transitions to deleting but remains stuck.

      Inspect BMO / Ironic logs and see repeated cleaning failures referencing file:///templates/uefi_esp.img not found.

      Actual Results

      BMH remains stuck in deleting with provisioning error.

      Ironic repeatedly fails cleaning due to missing uefi_esp.img.

      Multiple attempts to set maintenance mode and retry, but deletion does not proceed normally.

      Example Ironic logs:

      Failed to prepare node <UUID> for cleaning:
      Validation of image href file:///templates/uefi_esp.img failed,
      reason: Specified image file not found.

      Expected Results

      Worker node should scale in cleanly.

      Ironic cleaning should complete without missing template errors.

      BMH should move from deleting → removed without manual intervention or long delays.

      Business Impact
      Delaying customer’s 4.20 rollout due to inability to reliably perform node lifecycle operations.

      Additional Information

      Node successfully disappears from the spoke cluster’s node list, so the failure is specific to the Ironic/BMH cleanup path.

      The missing EFI template suggests a packaging issue, regression in cleaning workflows, or an incorrect assumption about file layout inside the relevant container image.

      Full logs available upon request (hostnames redacted for security).

              rpittau@redhat.com Riccardo Pittau
              rhn-support-mlele Mihir Lele
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: