Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59133

BMH stuck in deleting state while performing IBBF

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • Important
    • None
    • Rejected
    • Metal Platform 273, Metal Platform 274
    • 2
    • Done
    • Bug Fix
    • Hide
      Before this update, when a Bare Metal Host (BMH) was marked as `Provisioned` or `ExternallyProvisioned`, the system would try to deprovision it or power it off first and the `DataImage` attached to the BMH would also prevent deletion. This blocked or slowed down host removal, creating operational inefficiencies. With this release, if the BMH has the ‘detached annotation’ and deletion is requested, the BMH transitions to the deleting state, allowing for direct deletion. (link:https://issues.redhat.com/browse/OCPBUGS-59133[OCPBUGS-59133])
      Show
      Before this update, when a Bare Metal Host (BMH) was marked as `Provisioned` or `ExternallyProvisioned`, the system would try to deprovision it or power it off first and the `DataImage` attached to the BMH would also prevent deletion. This blocked or slowed down host removal, creating operational inefficiencies. With this release, if the BMH has the ‘detached annotation’ and deletion is requested, the BMH transitions to the deleting state, allowing for direct deletion. (link: https://issues.redhat.com/browse/OCPBUGS-59133 [ OCPBUGS-59133 ])
    • None
    • None
    • None
    • None

      Description of problem:

          While attempting to perform IBBF with the siteconfig operator, the BMH for the spoke is stuck in deleting state for too long due to the old BMH being unreachable. This results in the clusterinstance timing out while waiting for the BMH resource to be removed. We should find a way to mitigate this.
      

          Version-Release number of selected component (if applicable):

      4.18.0-0.nightly-2025-04-02-011053 

      How reproducible:

          100%

      Steps to Reproduce:

          1. Install an IBI spoke cluster using siteconfig operator
          2. Remove old node and attempt to replace with new node through IBBF
          3.
          

      Actual results:

          Reinstallation fails due to timeout

      Expected results:

          Reinstallation succeeds

      Additional info:

      $ oc get clusterinstance target-0 -o json | jq -r '.status.reinstall.conditions[0]'
      {
        "lastTransitionTime": "2025-04-04T17:24:45Z",
        "message": "Encountered error executing task: Deleting rendered manifests. Error: deletion timeout exceeded for object (BareMetalHost:target-0/target-0-0): Timed out waiting to delete object (BareMetalHost:target-0/target-0-0)",
        "reason": "Failed",
        "status": "False",
        "type": "ReinstallRequestProcessed"
      }
      $ oc get bmh target-0-0 -o json | jq -r .status.errorMessage
      Failed to get power state for node 7e063860-56c7-45ca-8bfa-d8f63997bd10. Error: Redfish exception occurred. Error: Resource https://[fd2e:6f44:5dd8:4::1]:8000/redfish/v1/Systems/ac8d214a-ae10-46f5-a179-208341bcb688 not found 

              hroy@redhat.com Himanshu Roy
              treywest96 Trey West
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              Bill Gabor Bill Gabor
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: