Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37796

Some BMH resources stuck in "deprovisioning" or "deleting" state

XMLWordPrintable

    • Important
    • None
    • 3
    • Metal Platform 258, Metal Platform 259, Metal Platform 263
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When a cluster that is been deployed with Gitops/ZTP approach is being trying to deleted, some of the worker nodes stuck in "deprovisioning" or "deleting" state. When I check logs of metal3-baremetal-operator, I see the following message:
      
      {"level":"info","ts":1722497241.5526164,"logger":"provisioner.ironic","msg":"deleting host","host":"fi-911~storage-2.fi-911.tre.nsn-rdnet.net","ID":"accb0c10-8a95-4208-9e2d-c38d57ec0667","lastError":"Failed to prepare node accb0c10-8a95-4208-9e2d-c38d57ec0667 for cleaning: Validation of image href https://assisted-image-service-multicluster-engine.apps.fi-915.tre.nsn-rdnet.net/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiIwMzY1ZjlkNy0yNjFiLTRmMDUtOWMwYy0xMzNhZjMxZmU1ZDYifQ.4vZ9nNKf9BEv_xxRb_fqEgykwfa4O9Fs-k1kjyBcRjOF3c-qcbDaGMcEkSoOecMhDIFjoOkwTxSoVAKkpUrc0Q/4.16.0/x86_64/minimal.iso failed, reason: Got HTTP code 401 instead of 200 in response to HEAD request.","current":"clean failed","target":"available","deploy step":{"args":{},"interface":"deploy","priority":80,"step":"start_assisted_install"}}
      
      
      This is also effecting installation of cluster next time. When other nodes are booted up with discovery iso, these old nodes which were failing, this time stuck in "inspecting" state in ACM GUI. The nodes that are failing have "automatedCleaningMode: metadata" in their BMH resource.

      Version-Release number of selected component (if applicable):

          4.16.0

      How reproducible:

          Try to delete spoke cluster using Gitops approach and then some of the nodes who has "automatedCleaningMode: metadata" stuck in "deprovisioning state. Also, when you try to deploy the same cluster, these old node who were failing this time stuck in "inspecting" state.

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          During cluster deletion, some BMH resource stuck in "deprovisioning" or "deleting" state

      Expected results:

          BMH resources should have been removed successfully.

      Additional info:

          

              rpittau@redhat.com Riccardo Pittau
              skoksal@redhat.com Sarp Koksal
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: