-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Improvement
-
2
-
False
-
-
False
-
-
If a node is shut down while MCO executes an "rpm-ostree cleanup" command the MCP will become degraded.
This scenario is described here: https://issues.redhat.com/browse/OCPBUGS-58099
and here: https://redhat-internal.slack.com/archives/GH7G2MANS/p1750857107077139
This scenario often happens when we scale up a node, we check that it is healthy and then we scale it down and remove it. Randomly and rarely these actions will degrade the pool because of the mentioned issue and the test will fail
Before scaling down a node we need to make sure that it is not executing this command. One way could be to check that it has added the "SIGTERM" protection before scaling down.
We should try to avoid solving it with a sleep instruction, but if we have no other option it is still better than having the tests randomly failing.