-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18, 4.19, 4.20
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
None
-
None
-
None
-
None
-
Metal Platform 280
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Downtream bug for upstream bug: https://github.com/metal3-io/baremetal-operator/issues/2478 (by rhn-engineering-dtantsur )
When you try to remove a cluster during inspection phase, it moves to power off stage that cannot happen during inspection.
{"level":"info","ts":1762939414.235443,"logger":"controllers.BareMetalHost","msg":"host ready to be powered off","baremetalhost":{"name":"vsno5","namespace":"vsno5"},"provisioningState":"powering off before delete"}
{"level":"info","ts":1762939414.2354486,"logger":"provisioner.ironic","msg":"ensuring host is powered off (mode: hard)","host":"vsno5~vsno5"}
{"level":"info","ts":1762939414.244772,"logger":"provisioner.ironic","msg":"changing power state","host":"vsno5~vsno5"}
{"level":"info","ts":1762939414.244786,"logger":"provisioner.ironic","msg":"host in state that does not allow power change, try again after delay","host":"vsno5~vsno5","state":"inspect wait","target state":"manageable"}
This can happen if you create and remove very fast. But it would happen that the inspection phase never finishes, because of any miss configuration.
The deletion of the BMH object will get stuck on deleting until 3 retries:
{"level":"info","ts":1762941220.183747,"logger":"provisioner.ironic","msg":"power off error","host":"vsno5~vsno5","msg":"timeout reached while inspecting the node"}
{"level":"info","ts":1762941220.1837687,"logger":"controllers.BareMetalHost","msg":"Giving up on host power off after 3 attempts.","baremetalhost":{"name":"vsno5","namespace":"vsno5"},"provisioningState":"powering off before delete"}
{"level":"info","ts":1762941220.183773,"logger":"controllers.BareMetalHost","msg":"changing provisioning state","baremetalhost":{"name":"vsno5","namespace":"vsno5"},"provisioningState":"powering off before delete","old":"powering off before delete","new":"deleting"}
{"level":"info","ts":1762941220.1837842,"logger":"controllers.BareMetalHost","msg":"saving host status","baremetalhost":{"name":"vsno5","namespace":"vsno5"},"provisioningState":"powering off before delete","operational status":"error","provisioning state":"deleting"}
{"level":"info","ts":1762941220.1923869,"logger":"controllers.BareMetalHost","msg":"publishing event","baremetalhost":{"name":"vsno5","namespace":"vsno5"},"reason":"PowerManagementError","message":"timeout reached while inspecting the node"}
on an ZTP environment, this timeout would make other controlers (like siteconfig controller) to abort the deletion, and the cluster is not removed after the timeout.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
- links to