-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem: In the past we have observed evidence of servicing retry logic in BMO not working as expected. For example, this led to this issue: https://issues.redhat.com/browse/OCPBUGS-48789
We worked around this by preventing the condition needed to enter this state from happening, but now is the time to try to ensure retry works as expected
Version-Release number of selected component (if applicable): 4.19
How reproducible: often
Steps to Reproduce: 1. introduce a condition likely to cause servicing issue (e.g. attempt servicing on two-worker cluster) without applying any workarounds (e.g. preemptively powering off the node for a 5 minute period to allow healthchecks to fail and mark the node down) 2. attempt servicing 3. watch out for ICC issue as in the referenced bug
Actual results: the node gets stuck in servicing error
Expected results: BMO should keep retrying - once networking machinery realises DNS and router pods on the nodes are down, it should get past the ICC issue without workaround applied
Additional info: