-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.13, 4.10
-
No
-
Rejected
-
False
-
Description of problem:
IHAC who is referring https://bugzilla.redhat.com/show_bug.cgi?id=1943564 and https://bugzilla.redhat.com/show_bug.cgi?id=1903228 bug which got fixed in 4.9 version of OCP , but they are still facing the issue.
They are doing compliance operator's scans which involves the node reboot, during the node reboot, the cluster node drain was stopped due to one of the pods staying in `terminating` state.
master: 'pool is degraded because nodes fail with "1 nodes are reporting degraded
status on sync": "Node master3.ocp4.XXX.mcp is reporting: \"failed to drain
node : master3.ocp4.XXX.mcp after 1 hour\""'
2023-03-04T03:39:22.686744936Z I0304 03:39:22.686701 4844 drain.go:91] Draining failed with: error when waiting for pod "container-registry-79648798c4-9m68m" terminating: global timeout reached: 1m30s, retrying 2023-03-04T03:41:54.938201702Z I0304 03:41:54.938161 4844 drain.go:91] Draining failed with: error when waiting for pod "container-registry-79648798c4-9m68m" terminating: global timeout reached: 1m30s, retrying
Customer is not ok with the workaround and they are looking for permanent fix.
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Node drain is failing when pods on the node stuck in terminating state.
Expected results:
It should remove the pod stuck in terminating state and complete the node drain
Additional info:
I will share the sos report and must-gather soon.