-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.18.z
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
None
-
None
-
None
-
None
-
Metal Platform 272
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Most (10 of 12) BMH for cluster are stuck in deprovisioning state on cluster deletion.
- Cluster is defined/deployed via SiteConfig operator / ClusterInstance CR (ACM 2.13.2)
- Cluster is standard topology (3 cp + 9 workers) with a mix of baremetal and VM based nodes
- Cluster failed to install completely – The second control plane node did not register. When I looked it was booted into the old installation rather than the agent ISO.
- ClusterInstance CR was deleted to attempt a re-install.
- 10 of 12 BMH are stuck in "deprovisioning" (all have deletionTimestamp)
- The stuck nodes appear to be booted in the agent ISO from the initial install (16 h prior) based on uptime
Every 1000 seconds the baremetal operator logs show this for each of the stuck BMH:
oc logs -n openshift-machine-api metal3-baremetal-operator-56b97d4f7c-8th7c
<snip>
{"level":"error","ts":1746197869.178,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"cnfdf02-worker-1","namespace":"cnfdf02"},"namespace":"cnfdf02","name":"cnfdf02-worker-1","reconcileID":"e8611f55-0e48-4b2d-be15-e9f2c1ba73a7","error":"action \"deprovisioning\" failed: preprovisioningimages.metal3.io \"cnfdf02-worker-1\" is forbidden: unable to create new content in namespace cnfdf02 because it is being terminated","errorVerbose":"preprovisioningimages.metal3.io \"cnfdf02-worker-1\" is forbidden: unable to create new content in namespace cnfdf02 because it is being terminated\naction \"deprovisioning\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1695","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
Version-Release number of selected component (if applicable):
OCP (hub cluster) 4.18.1 ACM 2.13.2 OCP (deployed spoke cluster) 4.16.15
How reproducible:
unknown
Steps to Reproduce:
See above
Actual results:
Expected results:
Additional info: