Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55675

BMH stuck deprovisioning

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Important
    • None
    • None
    • None
    • None
    • Metal Platform 272
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Most (10 of 12) BMH for cluster are stuck in deprovisioning state on cluster deletion.

      • Cluster is defined/deployed via SiteConfig operator / ClusterInstance CR (ACM 2.13.2)
      • Cluster is standard topology (3 cp + 9 workers) with a mix of baremetal and VM based nodes
      • Cluster failed to install completely – The second control plane node did not register. When I looked it was booted into the old installation rather than the agent ISO.
      • ClusterInstance CR was deleted to attempt a re-install.
      • 10 of 12 BMH are stuck in "deprovisioning" (all have deletionTimestamp)
      • The stuck nodes appear to be booted in the agent ISO from the initial install (16 h prior) based on uptime

      Every 1000 seconds the baremetal operator logs show this for each of the stuck BMH:

      oc logs -n openshift-machine-api metal3-baremetal-operator-56b97d4f7c-8th7c 
      <snip>
      {"level":"error","ts":1746197869.178,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"cnfdf02-worker-1","namespace":"cnfdf02"},"namespace":"cnfdf02","name":"cnfdf02-worker-1","reconcileID":"e8611f55-0e48-4b2d-be15-e9f2c1ba73a7","error":"action \"deprovisioning\" failed: preprovisioningimages.metal3.io \"cnfdf02-worker-1\" is forbidden: unable to create new content in namespace cnfdf02 because it is being terminated","errorVerbose":"preprovisioningimages.metal3.io \"cnfdf02-worker-1\" is forbidden: unable to create new content in namespace cnfdf02 because it is being terminated\naction \"deprovisioning\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
      

      Version-Release number of selected component (if applicable):

      OCP (hub cluster) 4.18.1
      ACM 2.13.2
      OCP (deployed spoke cluster) 4.16.15

      How reproducible:

      unknown

      Steps to Reproduce:

      See above

      Actual results:

          

      Expected results:

          

      Additional info:

          

       

              imelofer Iury Gregory Melo Ferreira
              rhn-support-imiller Ian Miller
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: