Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24266

BareMetalHost CR fails to delete on cluster cleanup

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None
    • 1/8: yellow, no fcast as of now

      With ACM2.9. When deleting a cluster (sync by argocd) this get stuck deleting. It looks similar to a previous bug fixed some time ago:
      https://issues.redhat.com/browse/OCPBUGS-3029
       
      When deleting a cluster using ZTP (or ArgoCD) all the related objects are deleted at once. So, the BMH, which is trying to do deprovisioning, get stuck because it needs something from other objects already deleted.

      How can I reproduce this? * create a cluster with a wrong rootdevicehint. Or whatever that will make the installation to fail, once the Assisted Agent goes into play.
       * The cluster fails installation.
       * Delete the cluster with ZTP. That tries to delete all the resources at once.
       * The bmh (and secrets) get stuck.
       * You have to manually delete the finalizers.

       
      It is the same scenario that was fixed by:

      [https://github.com/openshift/baremetal-operator/pull/262]

      The problem happens only when the cluster is deployed with `automatedCleaningMode: metadata`. This makes BMH to do deprovisioning, but it is deleted at the same time than other objects.

       

      [Updated on 30th April 2024] Some clarifications after several discussions:

      The key reason for this to fail. If you had 'automatedCleaningMode' there will be some Ironic stages that needs to build the 'Preprovimage'. Some of these stages happens sequentially, or after an action (delete), and other depends in other conditions. For example, if 'automatedCleaningMode' enabled there are more stages that needs to create the image, like when you delete a cluster. Because ZTP deletes everything at the same time (including the NS) the Preprovimage cannot be created. Therefore, we need a fix to avoid this situation: "the Preprovimage has to be created and it cannot be done". If you had 'automatedCleaningMode' enabled, it happens on more stages and you are more exposed to the bug.

      Dont get confused with other similar bug, but this time, the 'Preprovimage' was tried to be created on a wrong stage: https://issues.redhat.com/browse/OCPBUGS-33048  Not related to 'automatedCleaningMode' and happening because of image creation when it was not needed. It is true that the issue is raised because of ZTP deleting everything. Otherwise, the wrong behavior would be hidden.

       

       

       

            rhn-engineering-hpokorny Honza Pokorny
            jgato@redhat.com Jose Gato Luis
            Jad Haj Yahya Jad Haj Yahya
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: