Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37796

Some BMH resources stuck in "deprovisioning" or "deleting" state

XMLWordPrintable

    • Important
    • None
    • 3
    • Metal Platform 258, Metal Platform 259, Metal Platform 262
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When a cluster that is been deployed with Gitops/ZTP approach is being trying to deleted, some of the worker nodes stuck in "deprovisioning" or "deleting" state. When I check logs of metal3-baremetal-operator, I see the following message:
      
      {"level":"info","ts":1722497241.5526164,"logger":"provisioner.ironic","msg":"deleting host","host":"fi-911~storage-2.fi-911.tre.nsn-rdnet.net","ID":"accb0c10-8a95-4208-9e2d-c38d57ec0667","lastError":"Failed to prepare node accb0c10-8a95-4208-9e2d-c38d57ec0667 for cleaning: Validation of image href https://assisted-image-service-multicluster-engine.apps.fi-915.tre.nsn-rdnet.net/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiIwMzY1ZjlkNy0yNjFiLTRmMDUtOWMwYy0xMzNhZjMxZmU1ZDYifQ.4vZ9nNKf9BEv_xxRb_fqEgykwfa4O9Fs-k1kjyBcRjOF3c-qcbDaGMcEkSoOecMhDIFjoOkwTxSoVAKkpUrc0Q/4.16.0/x86_64/minimal.iso failed, reason: Got HTTP code 401 instead of 200 in response to HEAD request.","current":"clean failed","target":"available","deploy step":{"args":{},"interface":"deploy","priority":80,"step":"start_assisted_install"}}
      
      
      This is also effecting installation of cluster next time. When other nodes are booted up with discovery iso, these old nodes which were failing, this time stuck in "inspecting" state in ACM GUI. The nodes that are failing have "automatedCleaningMode: metadata" in their BMH resource.

      Version-Release number of selected component (if applicable):

          4.16.0

      How reproducible:

          Try to delete spoke cluster using Gitops approach and then some of the nodes who has "automatedCleaningMode: metadata" stuck in "deprovisioning state. Also, when you try to deploy the same cluster, these old node who were failing this time stuck in "inspecting" state.

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          During cluster deletion, some BMH resource stuck in "deprovisioning" or "deleting" state

      Expected results:

          BMH resources should have been removed successfully.

      Additional info:

          

            rh-ee-masghar Mahnoor Asghar
            skoksal@redhat.com Sarp Koksal
            Jad Haj Yahya Jad Haj Yahya
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: